Data

We've included several datasets in the package that we use for examples, activities, etc.

VLDataScienceMachineLearningPackage.MyStringDecodeChallengeDatasetFunction
MyStringDecodeChallengeDataset() -> NamedTuple

Load the String Decode Challenge testing and production datasets.

Return

  • NamedTuple: A tuple containing the three datasets:
    • test_part_1: The first part of the test dataset.
    • test_part_2: The second part of the test dataset.
    • production: The production dataset.
source
VLDataScienceMachineLearningPackage.MySarcasmCorpusFunction
function MySarcasmCorpus() -> MySarcasmRecordCorpusModel

The function corpus reads a file composed of JSON records and returns the data as a MySarcasmRecordCorpusModel instance. Each record in the file is expected to have the following fields:

  • is_sarcastic::Bool - a boolean value indicating if the headline is sarcastic.
  • headline::String - the headline of the article.
  • article_link::String - the link to the article.

Returns

  • MySarcasmRecordCorpusModel - the data from the file as a MySarcasmRecordCorpusModel instance.
source
VLDataScienceMachineLearningPackage.MyGraphEdgeModelsFunction
function MyGraphEdgeModels(filepath::String, edgeparser::Function; comment::Char='#', 
delim::Char=',')::Dict{Int64,MyGraphEdgeModel}

Function to parse an edge file and return a dictionary of edges models.

Arguments

  • filepath::String: The path to the edge file.
  • edgeparser::Function: A callback function to parse each edge line. This function should take a line as input, and a delimiter character, and return a tuple of the form (source, target, data), where:
    • source::Int64: The source node ID.
    • target::Int64: The target node ID.
    • data::Any: Any additional data associated with the edge, e.g., a weight, a tuple of information, etc.

Returns

  • Dict{Int64,MyGraphEdgeModel}: A dictionary of edge models.
source