The Vinod Wadhawan Blog: January 2025

Wednesday, 22 January 2025

Big Data and the Future of the Scientific Method

[This blog post is based on a chapter in my new book ‘REVISITING THE SCIENTIFIC METHOD: The Need to Make Science More Inclusive in Scope’ (Wadhawan 2025).

A problem, as well as a boon, of modern times is that we have an overwhelming amount of data at our disposal. The term Big Data has been coined for this. Big Data means a complex and voluminous set of information comprising structured, unstructured, and semi-structured datasets, that is challenging to manage using traditional data-processing tools, and that requires additional infrastructure to govern, analyse, and convert into insights or knowledge. It is characterised by ‘volume’, ‘velocity’, ‘variety’, ‘veracity’, and ‘value’ (BasuMallik 2022; Tiao 2024).

Credit: (22) The Role of Big Data in Scientific Research | LinkedIn

The volume of data at our disposal is now measured in zettabytes (10²¹ bytes), and yottabytes (10²⁴ bytes), and it is growing exponentially rapidly. ‘A Big Data system has two closely related parts: storage and analysis. Analysis is based on storage, which facilitates access. Storage is based on analysis, which reduces volume. Analytical solutions that really respond to this problem have two features: induction and speed’ (Malle 2013).

A feature of Big Data is that just about any correlation can be found in it. This means that the dictum ‘correlation does not necessarily imply causation’ can be under threat, and we have to find sophisticated ways of deciding whether to take a correlation seriously or not.

‘The end of theory: The data deluge makes the scientific method obsolete’ was the attention-grabbing title of a provocative and much discussed article by the then Editor in Chief of the Wired magazine, Chris Anderson (2008). The article begins by quoting George Box (a statistician), who had written in 1976 that ‘all models are wrong, but some are useful’. The justification for such a statement comes from the nature, or rather the limitation, of deductive logic. Any model or hypothesis or theory is a generalisation based on the assumption that its underlying axioms are true. From the model we draw conclusions by deductive logic, and then go about checking their validity/falsifiability against the available information. As more and more information pours in with the passage of time, the model may fall by the wayside, to be replaced by a better model. This has happened several times in the history of science. Some examples are: Newton’s models of gravity and of space (replaced by Einstein’s theory of spacetime); the classical model about the simultaneity of events (replaced by Einstein’s theory of relativity); theory of classical mechanics (replaced by quantum mechanics); theories of human behaviour (modified or replaced again and again). The new models may, in turn, prove to be inadequate or ‘wrong’, to be replaced by still better or new models in the light of additional information that keeps pouring in. And so on. So, any model is liable to be ‘wrong’.

As information has gone on piling up, a stage has come when ‘information is not a matter of simple three- and four-dimensional taxonomy and order but of dimensionally agnostic statistics. It calls for an entirely different approach, one that requires us to lose the tether of data as something that can be visualized in its totality. It forces us to view data mathematically first and establish a context for it later’ (Anderson 2008). Anderson gave the example of the success achieved by Google in the advertising world: ‘Google conquered the advertising world with nothing more than applied mathematics. It didn't pretend to know anything about the culture and conventions of advertising — it just assumed that better data, with better analytical tools, would win the day. And Google was right. Google's founding philosophy is that we don't know why this page is better than that one: If the statistics of incoming links say it is, that's good enough. No semantic or causal analysis is required. That's why Google can translate languages without actually "knowing" them (given equal corpus data, Google can translate Klingon into Farsi as easily as it can translate French into German). And why it can match ads to content without any knowledge or assumptions about the ads or the content’. He quoted Google's research director Peter Norvig as saying: ‘All models are wrong, and increasingly you can succeed without them’.

This is a far cry from the traditional way of doing science, which is based on testable hypotheses. But we must admit that there is indeed a stalemate of sorts in physics at present. No big (conceptual) breakthroughs have been coming for quite some time now. We have the string theory, which we are unable to verify. It has not been possible to unify quantum mechanics with the theory of gravity. And then there is dark matter. And so on. Do we need to change tracks altogether? Many people think that we do. Here is Anderson’s (2008) take on this: ‘The reason physics has drifted into theoretical speculation about n-dimensional grand unified models over the past few decades (the "beautiful story" phase of a discipline starved of data) is that we don't know how to run the experiments that would falsify the hypotheses — the energies are too high, the accelerators too expensive, and so on. Now biology is heading in the same direction. The models we were taught in school about "dominant" and "recessive" genes steering a strictly Mendelian process have turned out to be an even greater simplification of reality than Newton's laws. The discovery of gene-protein interactions and other aspects of epigenetics has challenged the view of DNA as destiny and even introduced evidence that environment can influence inheritable traits, something once considered a genetic impossibility’. He is overstating a bit, but still has a point there.

Anderson’s way out was to assert that correlations are enough, and that knowledge about the causes behind the observed correlations can take a back seat to start with. He advocated a new kind of science: data-driven science. In it, we can stop looking for models or hypotheses. We can analyse the data without any preconceived hypotheses or biases about what it might show. We throw the numbers into big computing clusters and let sophisticated statistical algorithms find patterns where the existing science cannot. Such an approach is particularly relevant for bioinformatics, systems biology, epidemiology, ecology, etc. But can it replace the usual hypothesis-driven way of doing science? Not really, though it can be an important adjunct to it. Below I discuss some reactions to the points made by Anderson.

Mazzocchi (2015) examined the issue from the epistemological point of view. He asked: ‘Is data-driven research a genuine mode of knowledge production, or is it above all a tool to identify potentially useful information? Given the amount of scientific data available, is it now possible to dismiss the role of theoretical assumptions and hypotheses?’ He began by pointing out that, long before Anderson (2008), it was Francis Bacon who, as far back as in 1620, had argued in his Novum Organum that ‘scientific knowledge should not be based on preconceived notions but on experimental data. Deductive reasoning, he argued, is eventually limited because setting a premise in advance of an experiment would constrain the reasoning so as to match that premise. Instead, he advocated a bottom-up approach: In contrast to deductive reasoning, which has dominated science since Aristotle, inductive reasoning should be based on facts to generalize their meaning, drawing inferences from observations and data’.

Thus Big-Data based science has renewed the primacy of inductive reasoning in the form of technology-based empiricism. It is believed by some that this hypothesis-neutral way of creating knowledge will replace the hypothesis-driven way of doing research. Data mining can throw up unexpected correlations and patterns, which can then be used for generating new hypotheses for discovering the causes behind the correlations (see Hassanien et al. 2015). So, the new computational route ends up doing hypothesis-generating research rather than hypothesis-testing research.

Inductive algorithms occupy centre-stage here. Mazzocchi quotes Malle (2013): ‘Inductive reasoning generally produces no finished status. The results of inferences are likely to alter the inferences already made. It is possible to continue the reasoning indefinitely. The best inductive algorithms can evolve: they “learn”, they refine their way of processing data according to the most appropriate use which can be made. Permanent learning, never completed, produces imperfect but useful knowledge. Any resemblance with the human brain is certainly not a coincidence’. [As Malle explains, ‘induction, unlike deduction, is a mechanism used by the human brain at almost every moment. Indeed, despite the fact that deduction is considered as cleaner, more scientific, it occupies only a small portion of the processing time of our brain. It is particularly relevant when analysing a situation out of its context’.]

Mazzocchi and many others have expressed concerns that unscrupulous agents can influence what kind of Big Data is generated and made public, to the exclusion of information inimical to their interests or designs. Think of the Dark web. Perhaps we never get the ‘full’ picture at any point of time in our history. And, even when the intentions are not questionable and there is no manipulation, it is wrong to presume data-neutrality, even in good science. Data are not collected randomly. Experiments are designed and carried out within theoretical, methodological and instrumental limitations. There always are hypotheses and assumptions at play. Data collection is seldom unbiased. ‘Scientific research is carried out by human beings whose cognitive stance has been formed by many years of incorporating and developing cultural, social, rational, disciplinary ideas, preconceptions and values, together with practical knowledge. Scientists form their ideas and hypotheses based on specific theoretical and disciplinary backgrounds, which again are the result of decades or even centuries of history of scientific and philosophical thought’ (Mazzocchi 2015).

Calude and Longo (2017) delivered a severe blow to the predictions of Big Data enthusiasts about the ‘end of the scientific method’, who had been asserting that the hypothesis-free computer-discovered correlations found in an apparently ‘unbiased’ way from the Big Data are enough for the advancement of knowledge, and that we can ignore all pre-conceived causation aspects of any observation or data. They (Calude and Longo) used classical results from ergodic theory, Ramsey theory, and algorithmic information theory to prove that very large databases must contain arbitrary correlations, and that these correlations or ‘regularities’ appear only due to the large size, and not the nature, of data. ‘Such correlations can be found in ‘randomly generated, large-enough databases’. They proved that most correlations are spurious. The scientific method can be enriched by data mining in very large databases, but not replaced by it.

I discuss next the work of Succi and Coveney (2019), which I believe to be particularly important because it addresses the impact of Big Data on the science of complex systems. A very large chunk of activity in modern science is about complex systems. This century is the century of complexity, as was predicted by Stephen Hawking. Succi and Coveney point out that ‘once the most extravagant claims of Big Data are properly discarded, a synergistic merging of Big Data with big theory offers considerable potential to spawn a new scientific paradigm capable of overcoming some of the major barriers confronted by the modern scientific method originating with Galileo. These obstacles are due to the presence of nonlinearity, nonlocality and hyperdimensions which one encounters frequently in multiscale modelling’. They make the following four points:

1. Complex systems do not (generally) obey Gaussian statistics. This is almost a defining feature of complex systems. Their strongly correlated nature makes them generally obey power-law statistics (see Wadhawan 2018). This means that the law of large numbers (so characteristic of Gaussian statistics) generally does not hold for complex systems: For them, an increase in sample size (a major feature of Big data) is no guarantee that the error or uncertainty in the estimated or mean value decreases monotonically. In a system obeying Gaussian statistics, the chances of occurrence of an event far from the mean value are small; the bell-shaped Gaussian curve has a small tail. Not so for power-law statistics; the tail may be far from being small. ‘This explains why the Big Data trumpets should be toned down: when rare events are not so rare, convergence rates can be frustratingly slow even in the face of petabytes of data’.

2. No data is big enough for systems with strong sensitivity to data inaccuracies. The evolution of a complex or chaotic system is well-known to be a very sensitive function of ‘initial conditions’ or ‘data inaccuracies’ (see Wadhawan 2018). Think of the ‘butterfly effect’. This flies in the face of Big Data radicalism: The main claim of Big Data enthusiasts is that we can extract patterns from data, or discover correlations between phenomena we never thought of as connected, simply because of the large sample sizes.

3. Correlation does not imply causation, the link between the two becoming exponentially fainter at increasing data sizes. Correlations between two sets of data or signals can be either true correlations (TC) or false correlations (FC). TCs indicate a causative relationship or connection. And an FC is that which just happens to be observed for apparently no likely reason; it is a ‘spurious’ correlation. But distinguishing between a true and a false correlation can be tough at times. And the problem is compounded by the fact that as data sizes grow, false correlations become more and more common (as proved by Calude and Longo 2017). It is also true that, as shown by Meng (2014), to be able to make statistically reliable inferences one needs to have access to more than 50% of the data on which to perform one’s machine learning for detecting patterns or correlations. According to Succi and Coveney (2019), what we need are ‘many more theorems that reliably specify the domain of validity of the methods and the amounts of data to produce statistically reliable conclusions’. They cite the example of a paper by Karbalayghareh et al. (2018) as a step in the right direction.

4. In a finite-capacity world, too much data is just as bad as no data. We extract information from data, knowledge from information, and wisdom from knowledge. The step from knowledge to wisdom may well involve hypothesising a model for the underlying cause(s). And the wisdom gained can be utilised for optimising the model by a repetitive, circular, process of reasoning. Offhand we might think that an expanded database should lead to a corresponding increase in information, knowledge, and wisdom, in a linear sort of way. But the linearity is not guaranteed, particularly for complex systems. Succi and Coveney (2019) make the point that for finite complex systems a state of nonlinear saturation is reached sooner or later as more and more data pour in: ‘This is the very general competition-driven phenomenon by which increasing data supply leads to saturation and sometimes even loss of information; adding further data actually destroys information. … Beyond a certain threshold, further data does not add any information, simply because additional data contain less and less new information, and ultimately no new information at all. . . . We speculate, without proof, that this is a general rule in the natural world’. [Destruction of information occurs if new and old data are mutually contradictory.]

Here are some final conclusions in the essay by Succi and Coveney (2019): ‘There is no doubt that the “Big Data/machine learning/artificial intelligence” approach has plenty of scope to play a creative and important role in addressing major scientific problems. Among the applications, pattern recognition is particularly powerful in detecting patterns which might otherwise remain hidden indefinitely (modulo the problem of false positives). Possibly the most important role is likely to be in establishing patterns which then demand further explanation, where scientific theories are required to make sense of what is discovered. … Instead of rendering theory, modelling and simulation obsolete, Big Data should and will ultimately be used to complement and enhance it. Examples are flourishing in the current literature, with machine learning techniques being embedded to assist large-scale simulations of complex systems in materials science, turbulence, and also to provide major strides towards personalised medicine, a prototypical problem for which statistical knowledge will never be a replacement for patient-specific modelling. It is not hard to predict that major progress may result from an inventive blend of the two, perhaps emerging as a new scientific methodology’.

In the beginning we had theoretical science and experimental science. The advent of computers gave rise to a third kind of science, namely computational science: It happens often that the mathematical formulation of a model involves the writing of differential equations that are too difficult, if not impossible, to solve analytically. But the availability of powerful computers enables us to formulate the model in terms of difference equations, rather than differential equations. The difference equations can be solved to a desired degree of accuracy using powerful computers. The cellular-automata approach is an example of computational science (see Wadhawan 2018). And now the data deluge has given rise to a fourth type of science: data-driven science: Find the correlations first, and then go about looking for the reasons or causes behind the observed correlations. Big Data analytics plays a big role in this (see What is Big Data Analytics? | IBM).

Big Data theory is already a field of research in its own right. It is a set of generalized principles that explain the foundations, knowledge, and methods used in the practice of data-driven science (Big Data Theory | SpringerLink).

Not only ‘Big Data theory’, but also theories from Big Data! Yes, physical theories. ‘The Next Einstein: New AI Can Develop New Theories of Physics’ is the title of an article by Juelich (2024), in which the work of Merger et al. (2023) is described. Many scientists produce a large amount of data through their research. We may call them data producers. Then, once in a while, comes along a smart scientist who is able to see in the published data a common trend or pattern of great fundamental importance, leading to a leap of progress in science. A great example is that of J. C. Maxwell, whose Maxwell equations are textbook stuff. He not only unified the work of many earlier scientists, but also discovered ‘displacement current’ in the process. Some other examples are those of Newton, Einstein, and (P. W.) Anderson. They were data users rather than data producers. Efforts are afoot at present to develop AI that can play the game-changing role of the data-user type of scientist. The work of Merger et al. (2023) is a step in that direction. Their paper has the title ‘Learning Interacting Theories from Data’. I quote from its Abstract: ‘One challenge of physics is to explain how collective properties arise from microscopic interactions. Indeed, interactions form the building blocks of almost all physical theories and are described by polynomial terms in the action. The traditional approach is to derive these terms from elementary processes and then use the resulting model to make predictions for the entire system. But what if the underlying processes are unknown? Can we reverse the approach and learn the microscopic action by observing the entire system? We use invertible neural networks (INNs) to first learn the observed data distribution. By the choice of a suitable nonlinearity for the neuronal activation function, we are then able to compute the action from the weights of the trained model; a diagrammatic language expresses the change of the action from layer to layer. This process uncovers how the network hierarchically constructs interactions via nonlinear transformations of pairwise relations.’

In other words, it is a top-down approach, rather than a bottom-up approach for doing science. The data are the end results of the microscopic interactions. They lie at the top of the hierarchy, at the bottom of which are the microscopic interactions which have given rise to the data we observe. Machine learning has been used for generating a theory, without any prior knowledge about the nature of the microscopic interactions involved. I quote from their paper again: ‘Key to this approach is the use of a generative neural network, which maps a complicated data distribution to a simpler one. By decomposing this mapping into interactions between simpler features, we can better understand how and why models make predictions. We hence unravel the complex, hierarchical structure that has been learned by a neural network and explain it in a form that is central to physics: interactions between degrees of freedom’.

INNs can be applicable in a variety of other fields also: genomics, epidemiology, condensed-matter physics, astrophysics, climate modelling, ecology, economics, sociology, neuroscience.

‘Big Data marked a break in the evolution of information systems from three points of view: the explosion of available data, the increasing variety of these data, and their constant renewal. Processing these data demands more than just computing power. It requires a complete break from Cartesian logic. It calls for the non-scientific part of human thought: inductive reasoning’(Malle 2013).

‘Big Data, distributed computing and sophisticated data analysis all played a crucial role in the discovery of the Higgs boson—and perhaps in finding new ‘patterns’ they might also generate new hypotheses in this field. But the discovery of the Higgs boson was not data-driven. The collider experiments were mostly driven by theoretical predictions: It is because scientists were attempting to confirm the Standard Model of elementary particles that the discovery of the Higgs boson—the only missing piece—could occur’ (Fulvio Mazzocchi 2015).

References cited

Anderson, C. (2008). ‘The end of theory: The data deluge makes the scientific method obsolete’. Wired magazine, 16(7), 16. The End of Theory: The Data Deluge Makes the Scientific Method Obsolete (statlit.org)

BasuMallik, C. (2022). ‘What Is Big Data? Definition, Types, Importance, and Best Practices’. What Is Big Data? Definition and Best Practices (spiceworks.com).

Calude, C.S. and G. Longo (2017). ‘The Deluge of Spurious Correlations in Big Data’. Foundations of Science, 22, 595–612. https://doi.org/10.1007/s10699-016-9489-4. The Deluge of Spurious Correlations in Big Data | Foundations of Science (springer.com)

Malle, J.-P. (2013). ‘Big Data: Farewell to Cartesian Thinking?’. Paris-Tech Review (http://www.paristechreview.com/2013/03/15/big-data-cartesian-thinking/

Mazzocchi, F. (2015). ‘Could big data be the end of theory in science?: A few remarks on the epistemology of data-driven science’. EMBO reports, 16(10), 1250–1255. Could Big Data be the end of theory in science? (embopress.org)

Hassanien, A. E. (Eds.) (2015). Big Data in Complex Systems: Challenges and Opportunities. Big Data in Complex Systems: Challenges and Opportunities | SpringerLink.

Juelich, F. (March 12, 2024). ‘The Next Einstein: New AI Can Develop New Theories of Physics’. SciTechDaily. The Next Einstein: New AI Can Develop New Theories of Physics (scitechdaily.com)

Karbalayghareh, A., X. Qian, and E. R. Dougherty (2018). ‘Optimal Bayesian transfer learning’. IEEE Transactions on Signal Processing, 66 (14).

Meng, X. L. (2014). ‘A trio of inference problems that could win you a Nobel prize in statistics (if you help fund it)’. In X. Lin, C. Genest, D. L. Banks, G. Molenberghs, D. W. Scott, and J. -L. Wang (Eds.): Past, Present, and Future of Statistical Science, pp. 537–562. CRC Press, Boca Raton, FL.

Merger, C. et al. (20 November 2023). ‘Learning Interacting Theories from Data’. Physical Review X. DOI: 10.1103/PhysRevX.13.041033.

Succi, S. and P. V. Coveney (2019). ‘Big data: the end of the scientific method?’. Philosophical Transactions of the Royal Society A, 377(2142), 20180145. 1807.09515 (arxiv.org)

Tiao, S. (2024). ‘What is Big data?’. What Is Big Data? | Oracle

Wadhawan, V. K. (2018). Understanding Natural Phenomena: Self-Organization and Emergence in Complex Systems. CreateSpace Independent Publishing Platform, SC, USA.

Tuesday, 21 January 2025

Vinod Wadhawan’s new book ‘Revisiting the Scientific Method’

‘REVISITING THE SCIENTIFIC METHOD: The Need to Make Science More Inclusive in Scope’

by Vinod Kumar Wadhawan

Published by the author.

(Also powered by Amazon Kindle Direct Publishing, USA)

This 'Print on Demand' book can be ordered from the following websites:

eBook (Rs. 50):

Revisiting the Scientific Method | Pothi.com

Paperback (for buyers within India) (Rs. 375):

Revisiting the Scientific Method | Pothi.com

https://amzn.in/d/8iepjMD

Revisiting the Scientific Method - The Need to Make Science More Inclusive in Scope: Buy Revisiting the Scientific Method - The Need to Make Science More Inclusive in Scope by Vinod Kumar Wadhawan at Low Price in India | Flipkart.com

Paperback (for buyers outside India) (US$ 9.99):

Revisiting the Scientific Method: The Need to Make Science More Inclusive in Scope: Wadhawan, Vinod: 9798307226155: Amazon.com: Books

This book is a follow-up to the previous book by the author (‘The 8-Fold Way of the Scientific Method’). The previous book introduced the reader to the way science is done: Strict adherence to objectivity, rationality, and transparency in handling information about a natural phenomenon we want to understand. The present book goes a step further and takes a critical look at the Scientific Method to see what can be done to make science more inclusive in scope; for example by giving due importance to subjective or experiential information also, and not only empirical information. Together the two books provide a fairly comprehensive account of the nature of scientific research, and can serve as course material for the training of an aspiring scientist. The theme of consciousness runs throughout the present book, because it is the most important example of a nonphysical phenomenon or entity that current science, by and large, tends to stay away from. Ways are suggested for dealing with this problem by relaxing in a carefully guarded manner some of the eight tenets of the present Scientific Method. Big Data is a rather recent development in the history of science. Its availability is going to have far-reaching consequences for the way science is going to be done now. The book discusses its promises and pitfalls. Present-day science, which is mostly reductionistic in approach, is not adequate enough for dealing with complex systems. Here again, Big Data may be of big help because pattern formation is an important characteristic of many complex systems, and Big Data is very good (rather too good!) at discerning patterns or correlations. Discussion of Karl Popper’s falsifiability criterion occupies substantial space in this short book. This is because this criterion is the main reason why the present Scientific Method labels many questions about Nature as unscientific or nonscientific, thus limiting the scope of scientific enquiry. In a more inclusive approach one would also give due importance to the philosophical rival of falsificationism, namely verificationism. Another way of making science more inclusive is to use a diluted version of falsificationism, formulated by Imre Lakatos. There is also a discussion of the work of some other philosophers of science, notably Nicholas Maxwell and Thomas Kuhn. Popper’s philosophy for doing science has proved to be very successful, but it is desirable that the student of science be also aware of other models of how science can progress.

CONTENTS

Preface vii

1. The Scientific Method 1

1.1 Asking the right question 2

1.2 Objective observation of the world 4

1.3 Coming up with hypotheses for understanding the data 6

1.4 Reproducible verification of predictions of hypotheses 12

1.5 A theory for explaining the hypotheses 15

1.6 Use of unambiguous language and logic 18

1.7 Choice of the smallest necessary set of axioms 21

1.8 The falsification requirement 25

2. Complex Systems, Complexity Transitions 27

2.1 Complex systems 27

2.2 Towards a formal definition of a complex system 30

2.3 Complexity transitions 34

2.4 Emergence 36

2.5 Reductionistic science is inadequate for dealing with complex systems 38

3. Big Data and the Future of the Scientific Method 41

4. Life, Intelligence, Consciousness 53

4.1 Life 53

4.2 Intelligence 60

4.3 Consciousness 62

4.4 One consciousness or many? 73

5. Artificial Life, Intelligence, Consciousness 77

5.1 Artificial life 77

5.2 Artificial intelligence 80

5.3 ChatGPT 81

5.4 Artificial consciousness, or machine consciousness 87

5.5 Can a robot acquire consciousness? 90

5.6 Have some of our machines already become sentient? 91

5.7 Apocalyptic AI and transhumanism 94

6. Towards a More Inclusive Scientific Method 99

6.1 Asking the right question 99

6.2 Observation of the world 100

6.3 Coming up with hypotheses for understanding the data 102

6.4 Testing of predictions of hypotheses 104

6.5 A theory for explaining the hypotheses 105

6.6 The language and logic of science 106

6.7 Choice of axioms 107

6.8 How justified is the falsifiability requirement? 108

7. Concluding Remarks 115

Bibliography 123

Index 135

Preface

Nature is all there is. All phenomena are natural phenomena, and the Scientific Method is the method of choice for investigating them. The study of all natural phenomena should come within the purview of science. But the strictness of the Scientific Method presently used makes it inapplicable for investigating certain nonphysical phenomena in their entirety, a good example being that of consciousness. In a general sense, consciousness is a state of awareness of one's thoughts, feelings, sensations, and surroundings. It is the subjective experience of perception, cognition, and emotions that is sometimes described as the ‘sense of self’. Consciousness is a fundamental aspect of human experience and an essential characteristic of the human mindbody, and of many other life forms. An important requirement of the Scientific Method is that in any scientific discourse, every word used must convey the same meaning to all concerned. Words like ‘consciousness’ create problems on that score. Although the neural correlates of consciousness help us in taking an empirical view of the situation, the problem remains that we do not know how physical processes in the brain give rise to subjective experiences. How to remedy the situation so that we can have meaningful scientific dialogue on such natural phenomena also? In my previous book (The 8-Fold Way of the Scientific Method) I gave a fairly comprehensive discussion of what the Scientific Method is all about, and how science is done by applying it. The achievements of the method have been spectacular, and we humans can be truly proud that we invented it. But, as pointed out near the end of that book, there is a need to relax its dictums (in a carefully guarded way) so that the scope of scientific investigations can become more inclusive. I suggested some ways there, and discuss the matter more comprehensively in the present book. There is a related issue, namely that of understanding complex systems. The human mindbody is perhaps the most complex system of them all. The present Scientific Method is suitable mainly for doing reductionistic science. A characteristic feature of a complex system is that it must be investigated as a whole, and not by reducing it into parts and assuming that if we understand the parts, we understand the whole. For complex systems, the whole is more than the sum of the parts, in a mutually interactive manner. An extended Scientific Method that is effective for studying complex systems will automatically become more inclusive in scope. Big Data is an inalienable feature of life in modern times. It is already influencing science and many other activities like finance, business, advertising, entertainment, government, warfare, etc. There is a viewpoint that Big Data can make it possible to do science without having to postulate hypotheses/models/theories beforehand. This may well be too optimistic, but there is no denying the fact that the availability of Big Data offers new opportunities for making science progress rapidly, particularly for investigating complex systems. Big Data Analytics can throw up unexpected correlations in the data, which may provide new leads for understanding a complex system. There is already a report that an artificial-intelligence system has been developed that is capable of formulating physical theories by recognizing patterns in complex data sets. So, we need an extended Scientific Method that makes science more inclusive in scope, that enables science to investigate complex systems by going beyond reductionism, and that incorporates the use of Big Data as an additional tool for doing science (data-driven science). In this book I explore the possibilities and make some suggestions. If insistence on empirical evidence limits the scope of science, then go for non-empirical and experiential evidence also. If the falsifiability requirement of the Scientific Method is too restrictive, then see what can be done to relax it in a carefully guarded and tentative manner. Big Data is offering wonderful new and highly unconventional ways of doing science. The training of scientists must now include awareness of this option, as also an exposure to the basics of how such research should be done.

Vinod Wadhawan New Delhi (January 2025)

The Vinod Wadhawan Blog

Pages

Wednesday, 22 January 2025

Big Data and the Future of the Scientific Method

Tuesday, 21 January 2025

Vinod Wadhawan’s new book ‘Revisiting the Scientific Method’

About me

Total pageviews