A Rice University-led team of software experts have launched an $11 million effort to create a sophisticated tool called PLINY that they believe could make writing computer programs as easy as searching the Internet.
What they are proposing is four year effort to create a Baysian logic based AI based data mining engine that will use a repository of open software to create a database of code that will be used to will power an engine that will be able to both “autocomplete” and “autocorrect” code for programmers.
“Imagine the power of having all the code that has ever been written in the past available to programmers at their fingertips as they write new code or fix old code,” said Vivek Sarkar, Rice's E.D. Butcher Chair in Engineering, chair of the Department of Computer Science and the principal investigator (PI) on the PLINY project. “You can think of this as autocomplete for code, but in a far more sophisticated way.”
Sarkar said the four-year effort is funded by the Defense Advanced Research Projects Agency (DARPA). PLINY, which draws its name from the Roman naturalist who authored the first encyclopedia, will involve more than two dozen computer scientists from Rice, the University of Texas-Austin, the University of Wisconsin-Madison and the company GrammaTech.
PLINY is part of DARPA's Mining and Understanding Software Enclaves (MUSE) program, an initiative that seeks to gather hundreds of billions of lines of publicly available open-source computer code and to mine that code to create a searchable database of properties, behaviors and vulnerabilities.
“Software today is far more complex than it was 20 years ago, yet it is still largely created by hand, one line of code at a time,” said project investigator Swarat Chaudhuri, assistant professor of computer science at Rice. “We envision a system where the programmer writes a few of lines of code, hits a button and the rest of the code appears. And not only that, the rest of the code should work seamlessly with the code that's already been written.”
He said PLINY will need to be sophisticated enough to recognize and match similar patterns regardless of differences in programming languages and code specifications. The system will have to explore different ways of interweaving code retrieved through search into a programmer's partially completed draft program and analyze the resulting code to make sure that it does not have bugs or security flaws.
The core of the system will be a data-mining engine that continuously scans the massive repository of open-source code. The engine will leverage the latest techniques in deep program analyses and big-data analytics to populate and refine a database that can be queried whenever a programmer needs help finishing or debugging a piece of code.
“The engine will formulate answers using Bayesian statistics,” said another project investigator Chris Jermaine, associate professor of computer science at Rice. “Much like today's spell-correction algorithms, it will deliver the most probable solution first, but programmers will be able to cycle through possible solutions if the first answer is incorrect.”