Top-Level Functions
This page documents functions that are accessed directly through the muller module.
Table of Contents
muller.read()
Overview
Utility that reads raw data from supported files into MULLER format. It can recompress data into the format required by the tensor if permitted by the tensor htype, or copy data directly if the file format matches the sample_compression of the tensor to maximize upload speeds.
Parameters
- path (
strorpathlib.Path): Path to a supported file. - verify (
bool, optional): IfTrue, contents of the file are verified. Defaults toFalse. - creds (
dict, optional): Credentials for s3, gcp and http urls. Defaults toNone. - compression (
str, optional): Format of the file. Only required if path does not have an extension. Defaults toNone. - storage (
StorageProvider, optional): Storage provider to use to retrieve remote files. Useful if multiple files are being read from same storage to minimize overhead of creating a new provider. Defaults toNone.
Returns
- Sample: Sample object. Call
sample.arrayto get thenp.ndarray.
Examples
import muller
# Read an image file
sample = muller.read("path/to/image.jpg")
array = sample.array
# Read with verification
sample = muller.read("path/to/data.png", verify=True)
# Read from remote storage with credentials
creds = {"aws_access_key_id": "...", "aws_secret_access_key": "..."}
sample = muller.read("s3://bucket/image.jpg", creds=creds)
# Specify compression explicitly
sample = muller.read("path/to/file", compression="png")
muller.tiled()
Overview
Allocates an empty sample of shape sample_shape, broken into tiles of shape tile_shape (except for edge tiles). This is useful for efficiently storing and accessing large samples by dividing them into smaller tiles.
Parameters
- sample_shape (
Tuple[int, ...]): Full shape of the sample. - tile_shape (
Tuple[int, ...], optional): The sample will be stored as tiles where each tile will have this shape (except edge tiles). If not specified, it will be computed such that each tile is close to half of the tensor'smax_chunk_size(after compression). Defaults toNone. - dtype (
strornp.dtype, optional): Dtype for the sample array. Defaults tonp.uint8.
Returns
- PartialSample: A PartialSample instance which can be appended to a Tensor.
Examples
import muller
import numpy as np
# Create a dataset with an image tensor
ds = muller.dataset("./my_dataset", overwrite=True)
ds.create_tensor("image", htype="image", sample_compression="png")
# Append a tiled sample
with ds:
ds.image.append(muller.tiled(sample_shape=(1003, 1103, 3), tile_shape=(10, 10, 3)))
# Fill part of the tiled sample with data
ds.image[0][-217:, :212, 1:] = np.random.randint(0, 256, (217, 212, 2), dtype=np.uint8)
# Create a tiled sample with default tile shape
tiled_sample = muller.tiled(sample_shape=(5000, 5000, 3))
# Create a tiled sample with specific dtype
tiled_sample = muller.tiled(
sample_shape=(2048, 2048),
tile_shape=(256, 256),
dtype=np.float32
)
muller.Sample
Overview
The Sample class represents a single data sample in MULLER format. It is typically created by the read() function and provides access to the underlying data as a numpy array.
Attributes
- array (
np.ndarray): The numpy array representation of the sample data. - path (
str): The path to the source file (if applicable). - compression (
str): The compression format of the sample.
Examples
import muller
# Create a Sample from a file
sample = muller.read("path/to/image.jpg")
# Access the numpy array
array = sample.array
print(array.shape)
print(array.dtype)
# Use Sample in dataset operations
ds = muller.dataset("./my_dataset", overwrite=True)
ds.create_tensor("images")
with ds:
ds.images.append(sample)
Notes
Sampleobjects are typically created automatically bymuller.read()rather than instantiated directly.- The
arrayproperty provides lazy loading - the data is only loaded when accessed. - Samples can be directly appended to tensors in a dataset.