Tuesday, June 30, 2009

What is Computational and Data Science?



Welcome to the blog for the Department of Computational and Data Sciences at George Mason University. In the coming weeks and months, we will be exploring a wide variety of topics related to the research and teaching we do, along with other wider issues in our field. We plan to post about one entry per week on the site written by our faculty, students, our alumni and other guest bloggers.

This blog was designed to help connect our department with a wider community working in the Computational Science and Data Science areas. We hope that our entries will prompt discussions and help promote cross disciplinary collaborations.

For the first entry in our blog, I wanted to talk about the most basic questions - what is Computational and Data Science and why is it important?

Over the last few hundred years, the tools and techniques in science have evolved in sophistication. The level of abstraction in our theories and the quality of our data has grown with our ability to transform basic concepts into complex instruments.

A good example of this transformation is medical imaging. Theories created by Maxwell in the 1800's have been combined with basic concepts in atomic physics from the early 1900's to create magnetic resonance imaging. Even with this breakthrough technology, making true 3-dimensional images of the human body was not possible until there was enough computational power to change the data from the instrument into an image. Using these images, colleagues of mine in our department have created 3-dimensional simulations of blood flow through the human brain for individual patients. By using these simulations, surgeons can make better informed decisions about when to operate to fix cerebral aneurysms. The same technologies, namely MRI imagers with computers, are now being used to do experimental economics at Mason to find out how we make economic decisions.

Basic theory was used as a foundation to develop tools when it was combined with computational power and technology. The tools have been used in unexpected ways, as our ability to analyze the data and model it has grown with the increases in computational power.

Across all the sciences, we see computers being used routinely by all scientists. A theoretical physicist, for example, routinely uses Mathematica or Matlab to create numerical solutions to ODE's and PDE's. At the same time, an experimental physicist uses automatic data acquisition hardware to capture data during an experiment and to analyze the results. We see similar users of computers across the sciences and engineering, as well as increasingly in the social sciences. This leads us to an interesting question- If computers are used everywhere, can we really say that Computational Science is something separate from the disciplines like physics and biology?

In fact, the sciences borrow ideas and techniques from each other all the time. Scientists across the disciplines talk about "using a mathematical model", "using physics", or "using statistics." However, even though the physics, mathematics, and statistics are integrated into other disciplines, they are separate academic fields by themselves. Statisticians don't consider themselves biologists just because a biologist is using statistics tools, nor does biologists consider themselves statisticians because they are using statistics. The same is true with Computational Science.

Just as with mathematics, statistics, and physics, most uses of the Computational and Data Sciences are relatively simple. Doing a numerical solution to an ODE, doing simple data analysis tasks, graphing data or setting up a simple scientific database are all part of our discipline, but they are at the simpler end of the spectrum of the things that CDS scientists do. We still use basic tools like Matlab at times, but we spend more time both using and developing advanced tools to solve more complex problems. At least from my point of view, the difference between using Matlab and developing a parallel code that uses the MPI is perhaps the difference between using the tools of Computational Science and being a Computational Scientist. Similarly, the using Excel to analyze data and creating a system that handles ten's of terabytes per day is the difference between using the tools of Data Science and being a Data Scientist.

A recent report to the President entitled "Computational Science: Ensuring America's Competitiveness" outlines some of the challenges we are facing in this field. This report states:
"Though the information technology-powered revolution is accelerating, this country has not yet awakened to the central role played by computational science and high-end computing in advanced scientific, social science, biomedical, and engineering research; defense and national security; and industrial innovation... While it [Computational Science] is itself a discipline, computational science serves to advance all of science. The most scientifically important and economically promising research frontiers in the 21st century will be conquered by those most skilled with advanced computing technologies and computational science
applications."

The principle recommendation of this report was:
"Universities and the Federal government's R&D agencies must make coordinated, fundamental, structural changes that affirm the integral role of computational science in addressing the 21st century's most important problems, which are predominantly multidisciplinary, multi-agency, multi-sector, and collaborative...."

Of course, we in the Department of Computational and Data Sciences couldn't agree more.

-John Wallin

No comments:

Post a Comment