Data Input and Output#


File Input and Output (File I/O)#

File Input/Output (File I/O) are operations for reading and writing data to and from files on your computer system. File I/O is a common task in programming, as it allows you to store and retrieve data from persistent storage, such as a hard drive or a cloud service, e.g., Box, Google Drive or One Drive.

In most programming languages, File I/O operations are encoded using a set of built-in functions or methods that allow you to open, read, write, and close files of standard types such as comma separated value (CSV) files, or modern file types such as the Tom’s Obvious Minimal Language (TOML) format, JavaScript Object Notation (JSON) format or YAML files.

CSV files#

Comma-separated value (CSV) files are delimited text files that use a comma to separate values. A CSV file typically stores tabular data (numbers and text) in plain text, where each line has the same fields. Each line of the file is a record consisting of one or more fields, separated by commas (or other characters such as a tab or space character).

In engineering or other quantitative applications, comma-separated value files are typically used to store, transmit and work with numerical data. Consider a comma-separated value file holding interest rate data for the last year:

Date,T=20-year-percentage,T=30-year-percentage
2021-09-17,1.82,1.88
2021-09-24,1.84,1.89
2021-10-01,2.00,2.05
2021-10-08,2.05,2.10
....

This data file has a header row containing column labels (text), while the other rows contain numerical data. Each row holds a data record that is composed of fields that have a type of data, e.g., the date or values for the spot rate for fixed income United States treasury debt securities.

Program: Read and write a CSV file#

Let’s develop a Julia function that loads a comma-separated value file from the local filesystem; first, let’s use the CSV.jl package, which is a specialized package to work with comma-separated value files:

using DataFrames
using CSV

function loadcsvfile(path::String)::DataFrame
    
    # check: is the path arg legit?
    # ...
    return CSV.read(path, DataFrame)
end

# set the path -
path_to_file = "Treasury-HistoricalData-09-09-22.csv";

# load CSV file -
df = loadcsvfile(path_to_file);

This first example takes advantage of the CSV.jl package, returns a table-like data structure called a DataFrame (which is implemented by the DataFrames.jl package). The DataFrame data structure (which we’ll explore later) offers several standard and advanced features for working with tabular data.

One possible criticism of the first loadcsvfile() implementation is that all the details of what is going on are hidden in the CSV.read() call. However, there may be cases or applications where we may want more control when reading (or writing) comma-separated value files.

Program: Read a CSV file refactored#

Let’s refactor the previous loadcsvfile() function so that we have access to each record as it is being loaded:

function loadcsvfile(path::String; delim::Char=',', keyindex::Int64 = 1)::Tuple{Array{String,1}, Dict{String,Array{Number,1}}}
    
    # check: is the path arg legit?
    # ....

    # initialize
    counter = 1
    header = Array{String,1}()
    data = Dict{String,Array{Float64,1}}()

    # main -
    open(path, "r") do io # open a stream to the file
        for line in eachline(io) # read each line from the stream
            
            # split the line around the delim -
            fields = split(line, delim);
            if (counter == 1)
                
                # package the header -
                for value in fields
                    push!(header, value)
                end

                # update the counter -
                counter = counter + 1
            else

                # package -
                tmp = Array{Float64,1}()
                keyfield = fields[keyindex]
                for (i,value) in enumerate(fields)
                    if (i != keyindex)
                        push!(tmp, parse(Float64,value))
                    end
                end
                data[keyfield] = tmp;
            end
        end
    end

    # return -
    return (header, data)
end

# set the path -
path_to_file = "Treasury-HistoricalData-09-09-22.csv";

# load file -
(h,d) = loadcsvfile(path_to_file);

TOML files#

TOML (Tom’s Obvious, Minimal Language) is a configuration file format intended to be easy to read and write and easy to parse. It is used to store application configuration data. TOML files consist of key-value pairs, similar to a dictionary in Julia or Python, and can also include nested groups of keys. TOML files often have a .toml file extension.

# This is a TOML configuration file for a database

# section: holds connection information
[connection]
host = "localhost"      # The database hostname
port = 5432             # The port to connect to the database on
database = "mydatabase" # The name of the database
user = "myuser"         # The username to connect to the database with
password = "mypassword" # The password for the user
max_connections = 10    # The maximum number of connections to allow at once
connection_timeout = 30 # The amount of time to wait before timing out a connection

# section: holds a group of database options
[options]
ssl = true              # Whether to enable SSL connections to the database
ssl_mode = "require"    # The preferred SSL mode to use

TOML files are widely used for storing configuration information. For example, in Julia, the package manager Pkg.jl holds information about the packages required for a project in a Project.toml file (which is automatically created when a project is activated). Because of its central role in Julia, TOML.jl, the package to read and write TOML files, is included in the Julia standard library. Thus, we don’t need to install it and can access it by placing the using TOML file at the start of our program.

# load packages 
using TOML

"""
    readtomlfile(path::String)::Dict{String,Any}

Load the TOML file at the path arg. Returns a Dict{String,Any} 
containing the TOML data.
"""
function readtomlfile(path::String)::Dict{String,Any}

    # check: does path point to a toml file?
    # ...

    return TOML.parsefile(path)
end

# setup path -
path_to_toml_file = "Database.toml"

# load -
d = readtomlfile(path_to_toml_file);

For more information on TOML, see the TOML specification, which describes the TOML format and the different types of data that can be stored in a TOML file.

JSON files#

JavaScript Object Notation (JSON) is a lightweight, text-based, language-independent data interchange format that is easy for humans to read and write and easy for machines to parse and generate. JSON is based on a subset of the JavaScript programming language and represents simple data structures and associative arrays. JSON is composed of two data structures:

  • A collection of name/value pairs typically realized as a struct, dictionary, keyed list, or associative array.

  • An ordered list of values. In most languages, this is realized as an array, vector, list, or sequence.

Here is an example of a JSON file that stores contact information:

{
  "people": [
    {
      "name": "John Smith",
      "email": "john@example.com",
      "phone": "555-555-5555"
    },
    {
      "name": "Jane Smith",
      "email": "jane@example.com",
      "phone": "444-444-4444"
    }
  ]
}

This JSON file defines an object with a single key, people; the people key has a value that is a list of objects, each representing a person. Each person’s object has three keys: the name, email, and phone keys. However, unlike TOML, the JSON format is not included in the Julia standard library. Instead, a variety of third-party packages are available for reading and writing JSON files, e.g., the JSON.jl package:

# load required packages
using JSON

"""
    readjsonfile(path::String)::Dict{String,Any}

Load the JSON file at the path arg. Returns a Dict{String,Any} 
containing the JSON data.
"""
function readjsonfile(path::String)::Dict{String,Any}

    # check: does path point to a json file?
    # ...

    return JSON.parsefile(path)
end

# setup path -
path_to_json_file = "Contacts.json"

# load -
d = readjsonfile(path_to_json_file);

For more information on JSON, see the JSON specification, which describes the JSON format and the different types of data that can be stored in a JSON file.

YAML files#

YAML (YAML Ain’t Markup Language) is a human-readable data serialization language that can be used to transmit data between systems. YAML is often a configuration file format for applications, similar to TOML. YAML files use a simple syntax that consists of key-value pairs and can also include nested groups of keys. YAML uses indentation to denote structure, similar to Python. YAML files often have a .yaml or .yml file extension.

Here is an example of a YAML file that could be used to store configuration data for an application:

# This is a YAML configuration file for an application

# Meta data about MyApp
name: MyApp             # The application's name
version: 1.0.0          # The version of the application
host: localhost         # The hostname to bind the application to
port: 8080              # The port to bind the application to

# A group of database options
database:
  host: localhost       # The database hostname
  port: 5432            # The port to connect to the database on
  name: mydatabase      # The name of the database
  user: myuser          # The username to connect to the database with
  password: mypassword  # The password for the user

Unlike TOML, the YAML format is not included in the Julia standard library. Instead, various third-party packages are available for working with YAML files, e.g., the YAML.jl package:

# load packages 
using YAML

"""
    readyamlfile(path::String)::Dict{String,Any}

Load the YAML file at the path arg. Returns a Dict{String,Any} 
containing the YAML data.
"""
function readyamlfile(path::String)::Dict{String,Any}

    # check: does path point to a yaml file?
    # ...

    return YAML.load_file(path)
end

# setup path -
path_to_yaml_file = "MyApp.yaml"

# load -
d = readyamlfile(path_to_yaml_file);

For more information on YAML, see the YAML specification which describes the YAML format and the different types of data that can be stored in a YAML file.

Web services and APIs#

A web service is a software system that enables machine-to-machine interaction over a network. Web services are often used to make the functionality of one application available to other applications or to provide data from one application to another. Web services can be accessed through a specified set of rules called application programming interfaces (APIs).

  • An API is a set of programming instructions for accessing a web-based software application. It specifies how software components should interact and allows communication between different systems. Thus, an API defines how a developer writes a program that requests services, such as data over the Internet.

  • A web service is a specific type of API that uses the HTTP protocol to exchange data over the internet. Web services can transfer data in various formats, such as JSON or CSV.

  • Finally, APIs are often associated with a specific web service implementation, although the terms are used interchangeably. For example, a website may have a public API and various internal APIs to manage its components and features.

RESTful APIs#

Here, we explore a particular type of API called a RESTful API, designed to be lightweight, flexible, and scalable. A RESTful API follows the Representational State Transfer (REST) architectural style, a widely used design pattern for web services:

  • Client-server architecture: In a RESTful API, the client, e.g., a web browser or a mobile application, and the server, e.g., a web server or a database in the cloud, are separated and communicate through a network, usually the Internet. This allows the client and the server to be developed and maintained independently, making the system more flexible and scalable.

  • Statelessness: In a RESTful API, the server does not maintain information about interactions with the client. This means that each client request must contain all the necessary information to understand and process the request; the server does not store any information about the client between requests. This makes the system easier to scale and maintain, as the server does not need to keep track of information about the client’s state.

  • Cacheability: RESTful APIs are designed to be cacheable, meaning that the server’s responses can be stored and reused by the client or a cache in the network. This increases the system’s efficiency; it reduces the need to make unnecessary requests to the server.

  • Layered system: A RESTful API can be used over a network of interconnected servers, each performing a specific task. This makes the system scalable and maintainable, as it can be built and deployed modularly.

RESTful APIs are often used to expose the functionality of a web service or a database over the Internet, allowing clients to interact with the service using the HTTP request-response model. RESTful APIs are widely used in web and mobile development and are essential for building modern applications.

What is HTTP?#

The Hypertext Transfer Protocol (HTTP) is a network protocol used for the transmission of data on the World Wide Web. HTTP allows for communication between clients (such as web browsers) and servers on the web. HTTP is based on a request-response model, where the client sends a request to the server, and the server responds with the requested resource or an error message if the request cannot be fulfilled.

For example, when looking for our class notes website:

  • Request: When you enter a URL into your web browser, the browser sends an HTTP request message to the server hosting the website. Thus, an HTTP request message is a question to a server written in a particular format that the server understands.

  • Response: Upon receiving the request, a server responds with an HTTP response message, which includes the website’s content and information about how to display it in the browser.

Requesting data from an application programming interface works similarly; we make an HTTP request to the server and get back an HTTP response object. However, in this case, the HTTP response object is not a webpage; instead, we get data, typically organized in some text format such as JavaScript Object Notation (JSON) that we can read in our program. Alternatively, we get an error message indicating that something went wrong with our request.

Let’s develop a function that makes a GET request from a dummy application programming interface shown in Example 12:


Summary#

Working with data is a common task in any computational project. For example, loading data into a program, doing some calculations using that data, and then saving the results to a file (locally or in the cloud) describes most programs you will write in this course and likely in your future career. In this lecture, we introduced topics in data input and output to support these efforts:

  • File Input and Output (File I/O) operations allow you to read from and write to files on your computer’s filesystem. We explored reading and writing data in standard formats such as comma-separated value (CSV) files, JavaScript Object Notation (JSON) files, TOML files, and YAML files in the context of the Julia programming language.

  • Then we introduced Web services and APIs and considered how to make requests to web services and APIs and process the data returned from these requests. We also introduced the Hypertext Transfer Protocol (HTTP), the protocol to transfer data over the web.