Return to search

RacketFrames: A DataFrame Implementation For The Racket Programming Language

The DataFrame is a powerful table-like data structure used frequently in Data Science, the in-demand and innovative field focused on the extraction of valuable insights from data. Typically, datasets are not perfect upon collection and need to be prepared so that the resulting dataset is useful for statistical analysis. A DataFrame API supports optimized methods such as, selecting, aggregating and filtering rows, columns, and cells as well as renaming row and column labels. It also supports methods for normalizing data, merging data, adding new columns and labelling missing data among numerous other features. An API to work with tabular data would be useful in any general purpose language, so DataFrames have been incorporated into libraries like Pandas for Python and provided as native libraries in the languages R and Scala. Due to their wide-ranging use it is not uncommon to find implementations in many other languages like Java and Julia \cite{BigData}.
In this work, we introduce RacketFrames, a Racket V8.0+ DataFrame implementation. We show the benefits an implementation can have on existing and future Racket projects. To quantify the performance of major DataFrame operations, we measure speed against Python Pandas and compare functional and object oriented paradigms. We hope to continue the trend for further Data Science tool development for Racket and other programming languages.

Identiferoai:union.ndltd.org:CALPOLY/oai:digitalcommons.calpoly.edu:theses-4221
Date01 March 2023
CreatorsKahal, Shubham
PublisherDigitalCommons@CalPoly
Source SetsCalifornia Polytechnic State University
Detected LanguageEnglish
Typetext
Formatapplication/pdf
SourceMaster's Theses

Page generated in 0.002 seconds