posted on 2023-11-10, 02:09authored byAnthony Peter James Shaw
<p>The two most popular Computer Programming languages for Data Science are Python, and R. Both are dynamically typed, interpreted languages. Python first appeared in 1991, and R in 1993. Thirty years later, computing architecture has moved toward parallel computation to compensate for decreased annual performance improvement for CPUs. The performance improvement rates for the period 1986-2003 were 52% per year then declining to 22% from 2003-2015. Modern CPU architecture features multiple cores that can parallelize execution even further with technologies such as Hyper-Threading. Despite this trend, Python and R are tied to their 1990s architectural design and are written with a single-threaded Global Interpreter Lock (Python) or a single-threaded interpreter (R). Furthermore, advancing massively parallel AI and Machine Learning algorithms requires support for implementations in code to leverage hardware with over 100 cores using GPU programming APIs. This research will explore the limitations of dynamically typed, interpreted languages on Data Science engineering. This research will demonstrate what factors are limiting the scalability of Python and what alternatives are being developed. This research will propose where investments in the industry should be placed to ensure long-term support for data mining, processing, and engineering applications. </p>
History
Table of Contents
1. Introduction -- 2. Background and state of the art -- 3. Methodology -- 4. Experiments and evaluation -- 5. Conclusion and future work -- A. Appendix - Benchmarks -- List of symbols -- Bibliography
Awarding Institution
Macquarie University
Degree Type
Thesis MRes
Degree
Master of Research
Department, Centre or School
School of Computing
Year of Award
2023
Principal Supervisor
Amin Beheshti
Additional Supervisor 1
Xuyun Zhang
Rights
Copyright: The Author
Copyright disclaimer: https://www.mq.edu.au/copyright-disclaimer