Package 'opendataformat' reference manual

Title:	Reading and Writing Open Data Format Files
Description:	The Open Data Format (ODF) is a new, non-proprietary, multilingual, metadata enriched, and zip-compressed data format with metadata structured in the Data Documentation Initiative (DDI) Codebook standard. This package allows reading and writing of data files in the Open Data Format (ODF) in R, and displaying metadata in different languages. For further information on the Open Data Format, see <https://opendataformat.github.io/>.
Authors:	Tom Hartl [aut, cre] , Claudia Saalbach [ctb]
Maintainer:	Tom Hartl <[email protected]>
License:	MIT + file LICENSE
Version:	2.2.0
Built:	2025-03-10 11:14:25 UTC
Source:	https://github.com/opendataformat/r-package-opendataformat

Open Data Format

Description

The package is designed to support the use of the open data format. For this purpose, three main functions have been developed:

read_odf()

Import data from the Open Data Format to an R data frame.

write_odf()

Export data from an R data frame to the open data format.

docu_odf()

Get access to information about the dataset and variables via the R-Studio Viewer or the web browser.

setlanguage_odf()

Set the default language for displaying the metadata for docu_odf and getmetadata_odf

getmetadata_odf()

Retrieve specific metadata like variable labels, or value labels.

as_odf_tbl()

Convert data frame (data.frame object or any subclass) to an ODF tibble (odf_tbl class object).

Author(s)

Tom Hartl ([email protected]), Claudia Saalbach ([email protected])

Other Contributors: KonsortSWD/NFDI, DIW Berlin

Converts a data frame to odf_tbl

Description

Converts a data.frame (or any subclass) object to an odf_tbl (Open Data Format tibble).

Usage

as_odf_tbl(x, active_language = "en", language_of_metadata = NA)
as_odf_tbl(x, active_language = "en", language_of_metadata = NA)

Arguments

`x`	a data.frame that should be converted to an odf_tbl. (`languages = "all"`).
`active_language`	Select the language that should be the active metadata language. Default is 'en', or the first language occurring.
`language_of_metadata`	Language of metadata, where language tag is missing, which is metadata in attributes label, description and labels. Default is NA.

Value

odf_tbl with attributes including dataset and variable information.

Examples

# Create a dataframe with 4 variables id, name, age, and diagnosis
exampledata <- data.frame(id = 1:5,
                          name = c("Klaus", "Anna", "Rebecca",
                                   "Kevin", "Janina"),
                          age = c(55, 40, 19, 25, 60), 
                          diagnosis = c(1,3,3,2,1))
# Add metadata for dataset
attr(exampledata, "name") <- "patientdata"
attr(exampledata, "label_en") <- "Patient Data"
attr(exampledata, "description_en") <- "Patient database of the practice Dr. Sommer"
attr(exampledata, "url") <- "www.example.url.en"

# Add metadata for diagnosis variable with label, description and value labels.
attr(exampledata$id, "name") <- "id"
attr(exampledata$id, "label_en") <- "Patiend ID"
attr(exampledata$id, "description_en") <- "Practice Patiend ID"
attr(exampledata$diagnosis, "name") <- "diagnose"
attr(exampledata$diagnosis, "label_en") <- "Diagnosis"
attr(exampledata$diagnosis, "description_en") <- "Diagnosis patient last visit"
valuelabels_diagnosis <- 1:4
names(valuelabels_diagnosis) <- c("Covid", "Influenza", "Common cold", "Tonsillitis")
attr(exampledata$diagnosis, "labels_en") <- valuelabels_diagnosis
# use as_odf_tbl() to transform dataframe to odf_tibble
example_odf  <-  as_odf_tbl(exampledata)

# Display metadata using docu_odf
docu_odf(example_odf, style = "print")

# Display metadata of diagnosis Variable
docu_odf(example_odf$diagnosis, style = "print")


# Create a dataframe with 4 variables id, name, age, and diagnosis
exampledata <- data.frame(id = 1:5,
                          name = c("Klaus", "Anna", "Rebecca",
                                   "Kevin", "Janina"),
                          age = c(55, 40, 19, 25, 60), 
                          diagnosis = c(1,3,3,2,1))
# Add metadata for dataset
attr(exampledata, "name") <- "patientdata"
attr(exampledata, "label_en") <- "Patient Data"
attr(exampledata, "description_en") <- "Patient database of the practice Dr. Sommer"
attr(exampledata, "url") <- "www.example.url.en"

# Add metadata for diagnosis variable with label, description and value labels.
attr(exampledata$id, "name") <- "id"
attr(exampledata$id, "label_en") <- "Patiend ID"
attr(exampledata$id, "description_en") <- "Practice Patiend ID"
attr(exampledata$diagnosis, "name") <- "diagnose"
attr(exampledata$diagnosis, "label_en") <- "Diagnosis"
attr(exampledata$diagnosis, "description_en") <- "Diagnosis patient last visit"
valuelabels_diagnosis <- 1:4
names(valuelabels_diagnosis) <- c("Covid", "Influenza", "Common cold", "Tonsillitis")
attr(exampledata$diagnosis, "labels_en") <- valuelabels_diagnosis
# use as_odf_tbl() to transform dataframe to odf_tibble
example_odf  <-  as_odf_tbl(exampledata)

# Display metadata using docu_odf
docu_odf(example_odf, style = "print")

# Display metadata of diagnosis Variable
docu_odf(example_odf$diagnosis, style = "print")

data_odf

Description

example data with attributes specified for the Open Data Format.

Usage

data_odf
data_odf

Format

A data frame with 20 rows and 7 variables:

bap87: Current Health.
bap9201: Hours of sleep, normal workday.
bap9001: Pressed For Time Last 4 Weeks.
bap9002: Run-down, Melancholy Last 4 Weeks.
bap9003: Well-balanced Last 4 Weeks.
bap96: Height.
name: Firstname.

Source

https://github.com/opendataformat/Specification/tree/main/Example

Get documentation from R data frame.

Description

Get access to information about the dataset and variables via the R-Studio Viewer or the web browser.

Usage

docu_odf(
  input,
  languages = "current",
  style = "viewer",
  replace_missing_language = FALSE,
  variables = "yes"
)
docu_odf(
  input,
  languages = "current",
  style = "viewer",
  replace_missing_language = FALSE,
  variables = "yes"
)

Arguments

`input`	R data frame (df) or variable from an R data frame (df$var).
`languages`	Select the language in which the descriptions and labels of the data will be displayed. By default the language that is set to current is displayed (`languages = "current"`). The default-option chooses either the default language(if labels and descriptions without a language tag exist)Otherwise the current language is displayed. (`languages = "default"`). You can choose to view all available language variants by selecting (`languages = "all"`), or you can select the language by language code, e.g. `languages = "en"`.
`style`	Selects where the output should be displayed (console ore viewer).By default the metadata information is displayed in the viewer if the viewer is available. (`style = "console"`) (`style = "print"`) You can choose to display the code in both the console and the viewer (`style = "both"`) (`style = "all"`) You can choose to display the code only in the viewer (`style = "viewer"`) (`style = "html"`)
`replace_missing_language`	If only one language is specified in languages and replace_missing_language is set to TRUE. In case of a missing label or description, the default or english label/description is displayed additionally (if one of these is available).
`variables`	Indicate whether a list with all the variables should be displayed with the dataset metadata. If the input is a variable/column, the variables-argument will be ignored. Set (`variables = "yes"`) to display the list of variables.

Value

Documentation.

Examples

# get example data from the opendataformat package
df <- get(data("data_odf"))

# view documentation about the dataset in the language that is currently set
docu_odf(df)

# view information from a selected variable in language "en"
docu_odf(df$bap87, languages = "en")

# view dataset information for all available languages
docu_odf(df, languages = "all")

# print information to the R console
docu_odf(df$bap87, style = "print")

# print information to the R viewer
docu_odf(df$bap87, style = "viewer")

# Since the label for language de is missing, in this case the
# english label will be displayed additionally.
attributes(df$bap87)["label_de"] <- ""
docu_odf(df$bap87, languages = "de", replace_missing_language = TRUE)

# get example data from the opendataformat package
df <- get(data("data_odf"))

# view documentation about the dataset in the language that is currently set
docu_odf(df)

# view information from a selected variable in language "en"
docu_odf(df$bap87, languages = "en")

# view dataset information for all available languages
docu_odf(df, languages = "all")

# print information to the R console
docu_odf(df$bap87, style = "print")

# print information to the R viewer
docu_odf(df$bap87, style = "viewer")

# Since the label for language de is missing, in this case the
# english label will be displayed additionally.
attributes(df$bap87)["label_de"] <- ""
docu_odf(df$bap87, languages = "de", replace_missing_language = TRUE)

Get variable labels or other metadata from a data frame in opendataformat.

Description

Get access to information about the dataset and variables via the R-Studio Viewer or the web browser.

Usage

getmetadata_odf(input, type, language = "active")
getmetadata_odf(input, type, language = "active")

Arguments

input

R data frame (df) or variable from an R data frame (df$var).

type

The metadata type you want to retrieve.Possible options are "label", "description", "url", "type", "valuelabels", or "languages".

language

Select the language in which the labels of the variables will be displayed. If no language is selected, the current/active language of the data frame will be used.

By default the language that is set to current is displayed (language = "current").
You can select the language by language code, e.g. language = "en".

Value

Documentation.

Examples

# get example data from the opendataformat package
df <- get(data("data_odf"))
# view the variable labels for all variables in English
getmetadata_odf(input = df, type = "label", language = "en")

# view the value labels for variable bap87 in English
getmetadata_odf(input = df$bap87, type = "valuelabel", language = "en")

# view the description for variable bap87 in English
getmetadata_odf(input = df$bap87, type = "description", language = "en")

# get example data from the opendataformat package
df <- get(data("data_odf"))
# view the variable labels for all variables in English
getmetadata_odf(input = df, type = "label", language = "en")

# view the value labels for variable bap87 in English
getmetadata_odf(input = df$bap87, type = "valuelabel", language = "en")

# view the description for variable bap87 in English
getmetadata_odf(input = df$bap87, type = "description", language = "en")

Merge method for odf tibbles.

Description

Merge two odf tibbles in R while keeping attributes with metadata.

Usage

## S3 method for class 'odf_tbl'
merge(
  x,
  y,
  by = NULL,
  by.x = NULL,
  by.y = NULL,
  all = FALSE,
  all.x = all,
  all.y = all,
  sort = TRUE,
  suffixes = c(".x", ".y"),
  no.dups = TRUE,
  allow.cartesian = getOption("datatable.allow.cartesian"),
  incomparables = NULL,
  ...
)
## S3 method for class 'odf_tbl'
merge(
  x,
  y,
  by = NULL,
  by.x = NULL,
  by.y = NULL,
  all = FALSE,
  all.x = all,
  all.y = all,
  sort = TRUE,
  suffixes = c(".x", ".y"),
  no.dups = TRUE,
  allow.cartesian = getOption("datatable.allow.cartesian"),
  incomparables = NULL,
  ...
)

Arguments

`x`, `y`	odf tibbles, or objects to be coerced to one
`by`	A vector of shared column names in x and y to merge on. This defaults to the shared key columns between the two tables. If y has no key columns, this defaults to the key of x.
`by.x`, `by.y`	Vectors of column names in x and y to merge on.
`all`	logical; all = TRUE is shorthand to save setting both all.x = TRUE and all.y = TRUE.
`all.x`	logical; if TRUE, rows from x which have no matching row in y are included. These rows will have 'NA's in the columns that are usually filled with values from y. The default is FALSE so that only rows with data from both x and y are included in the output.
`all.y`	logical; analogous to all.x above.
`sort`	logical. If TRUE (default), the rows of the merged data.table are sorted by setting the key to the by / by.x columns. If FALSE, unlike base R's merge for which row order is unspecified, the row order in x is retained (including retaining the position of missings when all.x=TRUE), followed by y rows that don't match x (when all.y=TRUE) retaining the order those appear in y.
`suffixes`	A character(2) specifying the suffixes to be used for making non-by column names unique. The suffix behaviour works in a similar fashion as the merge.data.frame method does.
`no.dups`	logical indicating that suffixes are also appended to non-by.y column names in y when they have the same column name as any by.x.
`allow.cartesian`	See allow.cartesian in `data.table`.
`incomparables`	values which cannot be matched and therefore are excluded from by columns.
`...`	Not used at this time.

Details

merge is a generic function in base R. It dispatches to either the merge.data.frame method, merge.odf_tbl or merge.data.table method depending on the class of its first argument. merge.odf_tbl uses the merge.data.table to join data.frame and adds the attributes containing metadata from the two original odf data.frames. Note that, unlike SQL join, NA is matched against NA (and NaN against NaN) while merging. For a more data.table-centric way of merging two data.tables, see data.table. See FAQ 1.11 for a detailed comparison of merge.

Value

A new odf tibble build from the two input data.frames with the variable attributes from the original data.frames. Sorted by the columns set (or inferred for) the by argument if argument sort is set to TRUE. For variables/columns occurring in both x and y, attributes are taken from x.

Examples

# get path to example data from the opendataformat package (data.zip)
path  <-  system.file("extdata", "data.odf.zip", package = "opendataformat")

# read four columns of example data specified as ODF from ZIP file
df  <-  read_odf(file = path, select = 1:4)

# read other columns of example data specified as ODF from ZIP file
df2  <-  read_odf(file = path, select = 4:7)

# generate a variable for joining both datasets:
df$id<-1:20
df2$id<-1:20

# merge both datasets by id column
merged_df<-merge(df, df2)

#merge both datasets by shared key columns between the two tables
merged_df2<-merge(df, df2)

# get path to example data from the opendataformat package (data.zip)
path  <-  system.file("extdata", "data.odf.zip", package = "opendataformat")

# read four columns of example data specified as ODF from ZIP file
df  <-  read_odf(file = path, select = 1:4)

# read other columns of example data specified as ODF from ZIP file
df2  <-  read_odf(file = path, select = 4:7)

# generate a variable for joining both datasets:
df$id<-1:20
df2$id<-1:20

# merge both datasets by id column
merged_df<-merge(df, df2)

#merge both datasets by shared key columns between the two tables
merged_df2<-merge(df, df2)

Read data specified as Open Data Format.

Description

Import data from the Open Data Format to an R data frame.

Usage

read_odf(
  file,
  languages = "all",
  nrows = Inf,
  skip = 0,
  select = NULL,
  na.strings = getOption("datatable.na.strings", "NA")
)
read_odf(
  file,
  languages = "all",
  nrows = Inf,
  skip = 0,
  select = NULL,
  na.strings = getOption("datatable.na.strings", "NA")
)

Arguments

`file`	the name of the file which the data are to be read from. By default all available language variants are imported (`languages = "all"`).
`languages`	integer: the maximum number of rows to read in. Negative and other invalid values are ignored.
`nrows`	Maximum number of lines to read.
`skip`	Select the number of rows to be skipped (without the column names).
`select`	A vector of column names or numbers to keep, drop the rest. In all forms of select, order that the columns are specified determines the order of the columns in the result.
`na.strings`	A character vector of strings which are to be interpreted as NA values. By default, ",," for columns of all types, including type character is read as NA for consistency. ,"", is unambiguous and read as an empty string. To read ,NA, as NA, set na.strings="NA". To read ,, as blank string "", set na.strings=NULL. When they occur in the file, the strings in na.strings should not appear quoted since that is how the string literal ,"NA", is distinguished from NA, for example, when na.strings="NA".

Value

R dataframe with attributes including dataset and variable information.

Examples

# get path to example data from the opendataformat package (data.zip)
path  <-  system.file("extdata", "data.odf.zip", package = "opendataformat")
path

# read example data specified as Open Data Format from ZIP file
df  <-  read_odf(file = path)
attributes(df)
attributes(df$bap87)

# read example data with language selection
df  <-  read_odf(file = path, languages = "de")
attributes(df$bap87)

# get path to example data from the opendataformat package (data.zip)
path  <-  system.file("extdata", "data.odf.zip", package = "opendataformat")
path

# read example data specified as Open Data Format from ZIP file
df  <-  read_odf(file = path)
attributes(df)
attributes(df$bap87)

# read example data with language selection
df  <-  read_odf(file = path, languages = "de")
attributes(df$bap87)

Change language of dataframe metadata

Description

Changes the active language of a dataframe with metadata for the docu_odf function.

Usage

setlanguage_odf(dataframe, language)
setlanguage_odf(dataframe, language)

Arguments

`dataframe`	R data frame (df) enriched with metadata in the odf-format.
`language`	Select the language to which you want to switch the metadata.

Value

Dataframe

Examples

# get example data from the opendataformat package
df  <-  get(data("data_odf"))

# Switch dataset df to language "en"
df  <-  setlanguage_odf(df, language = "en")

# Display dataset information for dataset df in language "en"
docu_odf(df)

# get example data from the opendataformat package
df  <-  get(data("data_odf"))

# Switch dataset df to language "en"
df  <-  setlanguage_odf(df, language = "en")

# Display dataset information for dataset df in language "en"
docu_odf(df)

Write R data frame to the Open Data Format.

Description

Export data from an R data frame to a ZIP file that stores the data as Open Data Format.

Usage

write_odf(
  x,
  file,
  languages = "all",
  export_data = TRUE,
  verbose = TRUE,
  compression_level = 5,
  odf_version = "1.1.0"
)
write_odf(
  x,
  file,
  languages = "all",
  export_data = TRUE,
  verbose = TRUE,
  compression_level = 5,
  odf_version = "1.1.0"
)

Arguments

`x`	R data frame (df) to be writtem.
`file`	Path to ZIP file or name of zip file to save the odf-dataset in the working directory.
`languages`	Select the language in which the descriptions and labels of the data will be exported By default all available language variants are exported (`languages = "all"`). You can also choose to export only the default language (`languages = "default"`), Or only the current language (`languages = "current"`), or you can select the language by language code, e.g. `languages = "en"`.
`export_data`	Choose, if you want to export the file that holds the data (data.csv).Default is TRUE.
`verbose`	Display more messages.
`compression_level`	A number between 1 and 9. 9 compresses best, but it also takes the longest.
`odf_version`	The ODF version of the output file. Default is the actual/most recent version. By default the data and metadata are exported (`export_data = TRUE`). To export only metadata and no data, select `export_data = FALSE`

Value

ZIP file and unzipped directory containing the data as CSV file and the metadata as XML file (DDI Codebook 2.5.).

Examples

# get example data from the opendataformat package
df  <-  get(data("data_odf"))

# write R data frame with attributes to the file my_data.zip specified
# as Open Data Format.
write_odf(x = df, paste0(tempdir(), "/my_data.zip"))

# write R data frame with attributes to the file my_data.zip
# with selected language.
write_odf(x = df,  paste0(tempdir(), "/my_data.zip"), languages = "en")

# write R data frame with attributes to the file my_data.zip but only
# metadata, no data.
write_odf(x = df,  file = paste0(tempdir(), "/my_data.zip"), export_data = FALSE)


# get example data from the opendataformat package
df  <-  get(data("data_odf"))

# write R data frame with attributes to the file my_data.zip specified
# as Open Data Format.
write_odf(x = df, paste0(tempdir(), "/my_data.zip"))

# write R data frame with attributes to the file my_data.zip
# with selected language.
write_odf(x = df,  paste0(tempdir(), "/my_data.zip"), languages = "en")

# write R data frame with attributes to the file my_data.zip but only
# metadata, no data.
write_odf(x = df,  file = paste0(tempdir(), "/my_data.zip"), export_data = FALSE)

Package 'opendataformat'

Help Index

Open Data Format

Description

read_odf()

write_odf()

docu_odf()

setlanguage_odf()

getmetadata_odf()

as_odf_tbl()

Author(s)

See Also

Converts a data frame to odf_tbl

Description

Usage

Arguments

Value

Examples

data_odf

Description

Usage

Format

Source

Get documentation from R data frame.

Description

Usage

Arguments

Value

Examples

Get variable labels or other metadata from a data frame in opendataformat.

Description

Usage

Arguments

Value

Examples

Merge method for odf tibbles.

Description

Usage

Arguments

Details

Value

Examples

Read data specified as Open Data Format.

Description

Usage

Arguments

Value

Examples

Change language of dataframe metadata

Description

Usage

Arguments

Value

Examples

Write R data frame to the Open Data Format.

Description

Usage

Arguments

Value

Examples