Data
We've included several datasets in the package that we use for examples, activities, etc.
VLDataScienceMachineLearningPackage.MyKaggleCustomerSpendingDataset
— FunctionMyKaggleCustomerSpendingDataset() -> DataFrame
Load the Kaggle customer spending dataset as a DataFrame. The original dataset can be found at: Spending dataset.
VLDataScienceMachineLearningPackage.MyStringDecodeChallengeDataset
— FunctionMyStringDecodeChallengeDataset() -> NamedTuple
Load the String Decode Challenge testing and production datasets.
Return
NamedTuple
: A tuple containing the three datasets:test_part_1
: The first part of the test dataset.test_part_2
: The second part of the test dataset.production
: The production dataset.
VLDataScienceMachineLearningPackage.MyCommonSurnameDataset
— FunctionMyCommonSurnameDataset() -> DataFrame
Load the common surnames dataset by country as a DataFrame. The original dataset can be found at: Common Surnames by Country.
VLDataScienceMachineLearningPackage.MyCommonForenameDataset
— FunctionMyCommonForenameDataset() -> DataFrame
Load the common forenames dataset by country as a DataFrame. The original dataset can be found at: Common Forenames by Country.
VLDataScienceMachineLearningPackage.MySarcasmCorpus
— Functionfunction MySarcasmCorpus() -> MySarcasmRecordCorpusModel
The function corpus
reads a file composed of JSON records and returns the data as a MySarcasmRecordCorpusModel
instance. Each record in the file is expected to have the following fields:
is_sarcastic::Bool
- a boolean value indicating if the headline is sarcastic.headline::String
- the headline of the article.article_link::String
- the link to the article.
Returns
MySarcasmRecordCorpusModel
- the data from the file as aMySarcasmRecordCorpusModel
instance.
VLDataScienceMachineLearningPackage.MySMSSpamHamCorpus
— Functionfunction MySMSSpamHamCorpus() -> MySMSSpamHamRecordCorpusModel
The function MySMSSpamHamCorpus
reads the SMS Spam Ham dataset and returns the data as a MySMSSpamHamRecordCorpusModel
instance.
VLDataScienceMachineLearningPackage.MyTrainingMarketDataSet
— FunctionMyTrainingMarketDataSet() -> Dict{String, DataFrame}
Load the components of the SP500 Daily open, high, low, close (OHLC) dataset as a dictionary of DataFrames. This data was provided by Polygon.io and covers the period from January 3, 2014, to December 31, 2024.
VLDataScienceMachineLearningPackage.MyGraphEdgeModels
— Functionfunction MyGraphEdgeModels(filepath::String, edgeparser::Function; comment::Char='#',
delim::Char=',')::Dict{Int64,MyGraphEdgeModel}
Function to parse an edge file and return a dictionary of edges models.
Arguments
filepath::String
: The path to the edge file.edgeparser::Function
: A callback function to parse each edge line. This function should take a line as input, and a delimiter character, and return a tuple of the form(source, target, data)
, where:source::Int64
: The source node ID.target::Int64
: The target node ID.data::Any
: Any additional data associated with the edge, e.g., a weight, a tuple of information, etc.
Returns
Dict{Int64,MyGraphEdgeModel}
: A dictionary of edge models.