Text

The core text management data structure which supports efficient modifications and provides a byte string interface. Text positions are represented as size_t. Valid addresses are in range [0, text_size(txt)]. An invalid position is denoted by EPOS. Access to the non-contigiuos pieces is available by means of an iterator interface or a copy mechanism. Text revisions are tracked in an history graph.

Note

The text is assumed to be encoded in UTF-8.

Load

enum TextLoadMethod

Method used to load existing file content.

Values:

enumerator TEXT_LOAD_AUTO

Automatically chose best option.

enumerator TEXT_LOAD_READ

Read file content and copy it to an in-memory buffer.

Subsequent changes to the underlying file will have no effect on this text instance.

Note

Load time is linear in the file size.

enumerator TEXT_LOAD_MMAP

Memory map the file from disk.

Use file system / virtual memory subsystem as a caching layer.

Note

Load time is (almost) independent of the file size.

Warning

Inplace modifications of the underlying file will be reflected in the current text content. In particular, truncation will raise SIGBUS and result in data loss.

Text *text_load(const char *filename)

Create a text instance populated with the given file content.

Note

Equivalent to text_load_method(filename, TEXT_LOAD_AUTO).

Text *text_loadat(int dirfd, const char *filename)
Text *text_load_method(const char *filename, enum TextLoadMethod)

Create a text instance populated with the given file content.

Parameters
  • filename – The name of the file to load, if NULL an empty text is created.

  • method – How the file content should be loaded.

Returns

The new Text object or NULL in case of an error.

Note

When attempting to load a non-regular file, errno will be set to:

  • EISDIR for a directory.

  • ENOTSUP otherwise.

Text *text_loadat_method(int dirfd, const char *filename, enum TextLoadMethod)
void text_free(Text*)

Release all resources associated with this text instance.

State

size_t text_size(const Text*)

Return the size in bytes of the whole text.

struct stat text_stat(const Text*)

Get file information at time of load or last save, whichever happened more recently.

Note

If an empty text instance was created using text_load(NULL) and it has not yet been saved, an all zero struct stat will be returned.

Returns

See stat(2) for details.

bool text_modified(const Text*)

Query whether the text contains any unsaved modifications.

Modify

bool text_insert(Text*, size_t pos, const char *data, size_t len)

Insert data at the given byte position.

Parameters
  • pos – The absolute byte position.

  • data – The data to insert.

  • len – The length of the data in bytes.

Returns

Whether the insertion succeeded.

bool text_delete(Text*, size_t pos, size_t len)

Delete data at given byte position.

Parameters
  • pos – The absolute byte position.

  • len – The number of bytes to delete, starting from pos.

Returns

Whether the deletion succeeded.

bool text_delete_range(Text*, const Filerange*)
bool text_printf (Text *, size_t pos, const char *format,...) __attribute__((format(printf
bool bool text_appendf (Text *, const char *format,...) __attribute__((format(printf

Access

The individual pieces of the text are not necessarily stored in a contiguous memory block. These functions perform a copy to such a region.

bool text_byte_get(const Text*, size_t pos, char *byte)

Get byte stored at pos.

Parameters
  • pos – The absolute position.

  • byte – Destination address to store the byte.

Returns

Whether pos was valid and byte updated accordingly.

Note

Unlike text_iterator_byte_get() this function does not return an artificial NUL byte at EOF.

size_t text_bytes_get(const Text*, size_t pos, size_t len, char *buf)

Store at most len bytes starting from pos into buf.

Parameters
  • pos – The absolute starting position.

  • len – The length in bytes.

  • buf – The destination buffer.

Returns

The number of bytes (<= len) stored at buf.

Warning

buf will not be NUL terminated.

char *text_bytes_alloc0(const Text*, size_t pos, size_t len)

Fetch text range into newly allocate memory region.

Parameters
  • pos – The absolute starting position.

  • len – The length in bytes.

Returns

A contiguous NUL terminated buffer holding the requested range, or NULL in error case.

Warning

The returned pointer must be freed by the caller.

Iterator

An iterator points to a given text position and provides interfaces to adjust said position or read the underlying byte value. Functions which take a char pointer will generally assign the byte value after the iterator was updated.

struct Iterator

Iterator used to navigate the buffer content.

Captures the position within a Piece.

Note

Should be treated as an opaque type.

Warning

Any change to the Text will invalidate the iterator state.

Iterator text_iterator_get(const Text*, size_t pos)
bool text_iterator_init(const Text*, Iterator*, size_t pos)
const Text *text_iterator_text(const Iterator*)
bool text_iterator_valid(const Iterator*)
bool text_iterator_has_next(const Iterator*)
bool text_iterator_has_prev(const Iterator*)
bool text_iterator_next(Iterator*)
bool text_iterator_prev(Iterator*)

Byte

Note

For a read attempt at EOF (i.e. text_size) an artificial NUL byte which is not actually part of the file is returned.

bool text_iterator_byte_get(const Iterator*, char *b)
bool text_iterator_byte_prev(Iterator*, char *b)
bool text_iterator_byte_next(Iterator*, char *b)
bool text_iterator_byte_find_prev(Iterator*, char b)
bool text_iterator_byte_find_next(Iterator*, char b)

Codepoint

These functions advance to the next/previous leading byte of an UTF-8 encoded Unicode codepoint by skipping over all continuation bytes of the form 10xxxxxx.

bool text_iterator_codepoint_next(Iterator *it, char *c)
bool text_iterator_codepoint_prev(Iterator *it, char *c)

Grapheme Clusters

These functions advance to the next/previous grapheme cluster.

Note

The grapheme cluster boundaries are currently not implemented according to UAX#29 rules. Instead a base character followed by arbitrarily many combining character as reported by wcwidth(3) are skipped.

bool text_iterator_char_next(Iterator*, char *c)
bool text_iterator_char_prev(Iterator*, char *c)

Lines

Translate between 1 based line numbers and 0 based byte offsets.

size_t text_pos_by_lineno(Text*, size_t lineno)
size_t text_lineno_by_pos(Text*, size_t pos)

History

Interfaces to the history graph.

bool text_snapshot(Text*)

Create a text snapshot, that is a vertex in the history graph.

size_t text_undo(Text*)

Revert to previous snapshot along the main branch.

Note

Takes an implicit snapshot.

Returns

The position of the first change or EPOS, if already at the oldest state i.e. there was nothing to undo.

size_t text_redo(Text*)

Reapply an older change along the main branch.

Note

Takes an implicit snapshot.

Returns

The position of the first change or EPOS, if already at the newest state i.e. there was nothing to redo.

size_t text_earlier(Text*)
size_t text_later(Text*)
size_t text_restore(Text*, time_t)

Restore the text to the state closest to the time given.

time_t text_state(const Text*)

Get creation time of current state.

Note

TODO: This is currently not the same as the time of the last snapshot.

Marks

A mark keeps track of a text position. Subsequent text changes will update all marks placed after the modification point. Reverting to an older text state will hide all affected marks, redoing the changes will restore them.

Warning

Due to an optimization cached modifications (i.e. no text_snapshot was performed between setting the mark and issuing the changes) might not adjust mark positions accurately.

typedef uintptr_t Mark

A mark.

EMARK

An invalid mark, lookup of which will yield EPOS.

Mark text_mark_set(Text*, size_t pos)

Set a mark.

Note

Setting a mark to text_size will always return the current text size upon lookup.

Parameters

pos – The position at which to store the mark.

Returns

The mark or EMARK if an invalid position was given.

size_t text_mark_get(const Text*, Mark)

Lookup a mark.

Parameters

mark – The mark to look up.

Returns

The byte position or EPOS for an invalid mark.

Save

enum TextSaveMethod

Method used to save the text.

Values:

enumerator TEXT_SAVE_AUTO

Automatically chose best option.

enumerator TEXT_SAVE_ATOMIC

Save file atomically using rename(2).

Creates a temporary file, restores all important meta data, before moving it atomically to its final (possibly already existing) destination using rename(2). For new files, permissions are set to 0666 & ~umask.

Warning

This approach does not work if:

  • The file is a symbolic link.

  • The file is a hard link.

  • File ownership can not be preserved.

  • File group can not be preserved.

  • Directory permissions do not allow creation of a new file.

  • POSIX ACL can not be preserved (if enabled).

  • SELinux security context can not be preserved (if enabled).

enumerator TEXT_SAVE_INPLACE

Overwrite file in place.

Warning

I/O failure might cause data loss.

bool text_save(Text*, const char *filename)

Save the whole text to the given file name.

Note

Equivalent to text_save_method(filename, TEXT_SAVE_AUTO).

bool text_saveat(Text*, int dirfd, const char *filename)
bool text_save_method(Text*, const char *filename, enum TextSaveMethod)

Save the whole text to the given file name, using the specified method.

bool text_saveat_method(Text*, int dirfd, const char *filename, enum TextSaveMethod)
TextSave *text_save_begin(Text*, int dirfd, const char *filename, enum TextSaveMethod)

Setup a sequence of write operations.

The returned TextSave pointer can be used to write multiple, possibly non-contiguous, file ranges.

Warning

For every call to text_save_begin there must be exactly one matching call to either text_save_commit or text_save_cancel to release the underlying resources.

ssize_t text_save_write_range(TextSave*, const Filerange*)

Write file range.

Returns

The number of bytes written or -1 in case of an error.

bool text_save_commit(TextSave*)

Commit changes to disk.

Returns

Whether changes have been saved.

Note

Releases the underlying resources and frees the given TextSave pointer which must no longer be used.

void text_save_cancel(TextSave*)

Abort a save operation.

Note

Does not guarantee to undo the previous writes (they might have been performed in-place). However, it releases the underlying resources and frees the given TextSave pointer which must no longer be used.

ssize_t text_write(const Text*, int fd)

Write whole text content to file descriptor.

Returns

The number of bytes written or -1 in case of an error.

ssize_t text_write_range(const Text*, const Filerange*, int fd)

Write file range to file descriptor.

Returns

The number of bytes written or -1 in case of an error.

Miscellaneous

bool text_mmaped(const Text*, const char *ptr)

Check whether ptr is part of a memory mapped region associated with this text instance.