Package 'opendataformat'

Title: Reading and Writing Open Data Format Files
Description: The Open Data Format (ODF) is a new, non-proprietary, multilingual, metadata enriched, and zip-compressed data format with metadata structured in the Data Documentation Initiative (DDI) Codebook standard. This package allows reading and writing of data files in the Open Data Format (ODF) in R, and displaying metadata in different languages. For further information on the Open Data Format, see <https://opendataformat.github.io/>.
Authors: Tom Hartl [aut, cre] , Claudia Saalbach [ctb]
Maintainer: Tom Hartl <[email protected]>
License: MIT + file LICENSE
Version: 2.0.0
Built: 2024-11-19 07:47:50 UTC
Source: https://github.com/opendataformat/r-package-opendataformat

Help Index


Open Data Format

Description

The package is designed to support the use of the open data format. For this purpose, three main functions have been developed:

read_odf()

Import data from the Open Data Format to an R data frame.

write_odf()

Export data from an R data frame to the open data format.

docu_odf()

Get access to information about the dataset and variables via the R-Studio Viewer or the web browser.

Author(s)

Tom Hartl ([email protected]), Claudia Saalbach ([email protected])

Other Contributors: KonsortSWD/NFDI, DIW Berlin

See Also

More information about the Open Data Format specification and data examples are available here: https://git.soep.de/opendata/


data_odf

Description

example data with attributes specified for the Open Data Format.

Usage

data_odf

Format

A data frame with 20 rows and 7 variables:

bap87

Current Health.

bap9201

Hours of sleep, normal workday.

bap9001

Pressed For Time Last 4 Weeks.

bap9002

Run-down, Melancholy Last 4 Weeks.

bap9003

Well-balanced Last 4 Weeks.

bap96

Height.

name

Firstname.

Source

https://github.com/opendataformat/Specification/tree/main/Example


Get documentation from R data frame.

Description

Get access to information about the dataset and variables via the R-Studio Viewer or the web browser.

Usage

docu_odf(
  input,
  languages = "current",
  style = "viewer",
  replace_missing_language = FALSE,
  variables = "yes"
)

Arguments

input

R data frame (df) or variable from an R data frame (df$var).

languages

Select the language in which the descriptions and labels of the data will be displayed.

  • By default the language that is set to current is displayed (languages = "current").

  • The default-option chooses either the default language(if labels and

  • descriptions without a language tag exist)Otherwise the current language

  • is displayed. (languages = "default").

  • You can choose to view all available language variants by selecting (languages = "all"),

  • or you can select the language by language code, e.g. languages = "en".

style

Selects where the output should be displayed (console ore viewer).By default the metadata information is displayed in the viewer if the viewer is available. (style = "console") (style = "print")

  • You can choose to display the code in both the console and the viewer (style = "both") (style = "all")

  • You can choose to display the code only in the viewer (style = "viewer") (style = "html")

replace_missing_language

If only one language is specified in languages and replace_missing_language is set to TRUE. In case of a missing label or description, the default or english label/description is displayed additionally (if one of these is available).

variables

Indicate whether a list with all the variables should be displayed with the dataset metadata. If the input is a variable/column, the variables-argument will be ignored. Set (variables = "yes") to display the list of variables.

Value

Documentation.

Examples

# get example data from the opendataformat package
df <- get(data("data_odf"))

# view documentation about the dataset in the language that is currently set
docu_odf(df)

# view information from a selected variable in language "en"
docu_odf(df$bap87, languages = "en")

# view dataset information for all available languages
docu_odf(df, languages = "all")

# print information to the R console
docu_odf(df$bap87, style = "print")

# print information to the R viewer
docu_odf(df$bap87, style = "viewer")

# Since the label for language de is missing, in this case the
# english label will be displayed additionally.
attributes(df$bap87)["label_de"] <- ""
docu_odf(df$bap87, languages = "de", replace_missing_language = TRUE)

Get variable labels or other metadata from a data frame in opendataformat.

Description

Get access to information about the dataset and variables via the R-Studio Viewer or the web browser.

Usage

getmetadata_odf(input, type, language = "active")

Arguments

input

R data frame (df) or variable from an R data frame (df$var).

type

The metadata type you want to retrieve.Possible options are "label", "description", "url", "type", "valuelabels", or "languages".

language

Select the language in which the labels of the variables will be displayed. If no language is selected, the current/active language of the data frame will be used.

  • By default the language that is set to current is displayed (language = "current").

  • You can select the language by language code, e.g. language = "en".

Value

Documentation.

Examples

# get example data from the opendataformat package
df <- get(data("data_odf"))
# view the variable labels for all variables in English
getmetadata_odf(input = df, type = "label", language = "en")

# view the value labels for variable bap87 in English
getmetadata_odf(input = df$bap87, type = "valuelabel", language = "en")

# view the description for variable bap87 in English
getmetadata_odf(input = df$bap87, type = "description", language = "en")

Merge method for odf data.frames.

Description

Merge two odf data.frames in R while keeping attributes with metadata.

Usage

## S3 method for class 'odf'
merge(
  x,
  y,
  by = NULL,
  by.x = NULL,
  by.y = NULL,
  all = FALSE,
  all.x = all,
  all.y = all,
  sort = TRUE,
  suffixes = c(".x", ".y"),
  no.dups = TRUE,
  allow.cartesian = getOption("datatable.allow.cartesian"),
  incomparables = NULL,
  ...
)

Arguments

x, y

odf data.frames, or objects to be coerced to one

by

A vector of shared column names in x and y to merge on. This defaults to the shared key columns between the two tables. If y has no key columns, this defaults to the key of x.

by.x, by.y

Vectors of column names in x and y to merge on.

all

logical; all = TRUE is shorthand to save setting both all.x = TRUE and all.y = TRUE.

all.x

logical; if TRUE, rows from x which have no matching row in y are included. These rows will have 'NA's in the columns that are usually filled with values from y. The default is FALSE so that only rows with data from both x and y are included in the output.

all.y

logical; analogous to all.x above.

sort

logical. If TRUE (default), the rows of the merged data.table are sorted by setting the key to the by / by.x columns. If FALSE, unlike base R's merge for which row order is unspecified, the row order in x is retained (including retaining the position of missings when all.x=TRUE), followed by y rows that don't match x (when all.y=TRUE) retaining the order those appear in y.

suffixes

A character(2) specifying the suffixes to be used for making non-by column names unique. The suffix behaviour works in a similar fashion as the merge.data.frame method does.

no.dups

logical indicating that suffixes are also appended to non-by.y column names in y when they have the same column name as any by.x.

allow.cartesian

See allow.cartesian in data.table.

incomparables

values which cannot be matched and therefore are excluded from by columns.

...

Not used at this time.

Details

merge is a generic function in base R. It dispatches to either the merge.data.frame method, merge.odf or merge.data.table method depending on the class of its first argument. merge.odf uses the merge.data.table to join data.frame and adds the attributes containing metadata from the two original odf data.frames. Note that, unlike SQL join, NA is matched against NA (and NaN against NaN) while merging. For a more data.table-centric way of merging two data.tables, see data.table. See FAQ 1.11 for a detailed comparison of merge.

Value

A new odf data.frame build from the two input data.frames with the variable attributes from the original data.frames. Sorted by the columns set (or inferred for) the by argument if argument sort is set to TRUE. For variables/columns occurring in both x and y, attributes are taken from x.

Examples

# get path to example data from the opendataformat package (data.zip)
path  <-  system.file("extdata", "data.zip", package = "opendataformat")

# read four columns of example data specified as ODF from ZIP file
df  <-  read_odf(file = path, select = 1:4)

# read other columns of example data specified as ODF from ZIP file
df2  <-  read_odf(file = path, select = 4:7)

# generate a variable for joining both datasets:
df$id<-1:20
df2$id<-1:20

# merge both datasets by id column
merged_df<-merge(df, df2)

#merge both datasets by shared key columns between the two tables
merged_df2<-merge(df, df2)

Read data specified as Open Data Format.

Description

Import data from the Open Data Format to an R data frame.

Usage

read_odf(file, languages = "all", nrows = Inf, skip = 0, select = NULL)

Arguments

file

the name of the file which the data are to be read from. By default all available language variants are imported (languages = "all").

languages

integer: the maximum number of rows to read in. Negative and other invalid values are ignored.

nrows

Maximum number of lines to read.

skip

Select the number of rows to be skipped (without the column names).

select

A vector of column names or numbers to keep, drop the rest. In all forms of select, order that the columns are specified determines the order of the columns in the result.

Value

R dataframe with attributes including dataset and variable information.

Examples

# get path to example data from the opendataformat package (data.zip)
path  <-  system.file("extdata", "data.zip", package = "opendataformat")
path

# read example data specified as Open Data Format from ZIP file
df  <-  read_odf(file = path)
attributes(df)
attributes(df$bap87)

# read example data with language selection
df  <-  read_odf(file = path, languages = "de")
attributes(df$bap87)

Change language of dataframe metadata

Description

Changes the active language of a dataframe with metadata for the docu_odf function.

Usage

setlanguage_odf(dataframe, language)

Arguments

dataframe

R data frame (df) enriched with metadata in the odf-format.

language

Select the language to which you want to switch the metadata.

Value

Dataframe

Examples

# get example data from the opendataformat package
df  <-  get(data("data_odf"))

# Switch dataset df to language "en"
df  <-  setlanguage_odf(df, language = "en")

# Display dataset information for dataset df in language "en"
docu_odf(df)

Write R data frame to the Open Data Format.

Description

Export data from an R data frame to a ZIP file that stores the data as Open Data Format.

Usage

write_odf(
  x,
  file,
  languages = "all",
  export_data = TRUE,
  verbose = TRUE,
  compression_level = 5
)

Arguments

x

R data frame (df) to be writtem.

file

Path to ZIP file or name of zip file to save the odf-dataset in the working directory.

languages

Select the language in which the descriptions and labels of the data will be exported

  • By default all available language variants are exported (languages = "all").

  • You can also choose to export only the default language (languages = "default"),

  • Or only the current language (languages = "current"),

  • or you can select the language by language code, e.g. languages = "en".

export_data

Choose, if you want to export the file that holds the data (data.csv).Default is TRUE.

  • By default the data and metadata are exported (export_data = TRUE).

  • To export only metadata and no data, select export_data = FALSE

verbose

Display more messages.

compression_level

A number between 1 and 9. 9 compresses best, but it also takes the longest.

Value

ZIP file and unzipped directory containing the data as CSV file and the metadata as XML file (DDI Codebook 2.5.).

Examples

# get example data from the opendataformat package
df  <-  get(data("data_odf"))

# write R data frame with attributes to the file my_data.zip specified
# as Open Data Format.
write_odf(x = df, paste0(tempdir(), "/my_data.zip"))

# write R data frame with attributes to the file my_data.zip
# with selected language.
write_odf(x = df,  paste0(tempdir(), "/my_data.zip"), languages = "en")

# write R data frame with attributes to the file my_data.zip but only
# metadata, no data.
write_odf(x = df,  file = paste0(tempdir(), "/my_data.zip"), export_data = FALSE)