Text¶
The core text management data structure which supports efficient
modifications and provides a byte string interface. Text positions
are represented as size_t
. Valid addresses are in range [0,
text_size(txt)]
. An invalid position is denoted by EPOS
. Access to
the non-contigiuos pieces is available by means of an iterator interface
or a copy mechanism. Text revisions are tracked in an history graph.
Note
The text is assumed to be encoded in UTF-8.
Load¶
-
enum TextLoadMethod¶
Method used to load existing file content.
Values:
-
enumerator TEXT_LOAD_AUTO¶
Automatically chose best option.
-
enumerator TEXT_LOAD_READ¶
Read file content and copy it to an in-memory buffer.
Subsequent changes to the underlying file will have no effect on this text instance.
Note
Load time is linear in the file size.
-
enumerator TEXT_LOAD_MMAP¶
Memory map the file from disk.
Use file system / virtual memory subsystem as a caching layer.
Note
Load time is (almost) independent of the file size.
Warning
Inplace modifications of the underlying file will be reflected in the current text content. In particular, truncation will raise
SIGBUS
and result in data loss.
-
enumerator TEXT_LOAD_AUTO¶
-
Text *text_load(const char *filename)¶
Create a text instance populated with the given file content.
Note
Equivalent to
text_load_method(filename, TEXT_LOAD_AUTO)
.
-
Text *text_loadat(int dirfd, const char *filename)¶
-
Text *text_load_method(const char *filename, enum TextLoadMethod)¶
Create a text instance populated with the given file content.
- Parameters
filename – The name of the file to load, if
NULL
an empty text is created.method – How the file content should be loaded.
- Returns
The new Text object or
NULL
in case of an error.Note
When attempting to load a non-regular file,
errno
will be set to:EISDIR
for a directory.ENOTSUP
otherwise.
-
Text *text_loadat_method(int dirfd, const char *filename, enum TextLoadMethod)¶
-
void text_free(Text*)¶
Release all resources associated with this text instance.
State¶
-
size_t text_size(const Text*)¶
Return the size in bytes of the whole text.
-
struct stat text_stat(const Text*)¶
Get file information at time of load or last save, whichever happened more recently.
Note
If an empty text instance was created using
text_load(NULL)
and it has not yet been saved, an all zerostruct stat
will be returned.- Returns
See
stat(2)
for details.
-
bool text_modified(const Text*)¶
Query whether the text contains any unsaved modifications.
Modify¶
-
bool text_insert(Text*, size_t pos, const char *data, size_t len)¶
Insert data at the given byte position.
- Parameters
pos – The absolute byte position.
data – The data to insert.
len – The length of the data in bytes.
- Returns
Whether the insertion succeeded.
-
bool text_delete(Text*, size_t pos, size_t len)¶
Delete data at given byte position.
- Parameters
pos – The absolute byte position.
len – The number of bytes to delete, starting from
pos
.
- Returns
Whether the deletion succeeded.
-
bool text_delete_range(Text*, const Filerange*)¶
- bool text_printf (Text *, size_t pos, const char *format,...) __attribute__((format(printf
- bool bool text_appendf (Text *, const char *format,...) __attribute__((format(printf
Access¶
The individual pieces of the text are not necessarily stored in a contiguous memory block. These functions perform a copy to such a region.
-
bool text_byte_get(const Text*, size_t pos, char *byte)¶
Get byte stored at
pos
.- Parameters
pos – The absolute position.
byte – Destination address to store the byte.
- Returns
Whether
pos
was valid andbyte
updated accordingly.Note
Unlike
text_iterator_byte_get()
this function does not return an artificial NUL byte at EOF.
-
size_t text_bytes_get(const Text*, size_t pos, size_t len, char *buf)¶
Store at most
len
bytes starting frompos
intobuf
.- Parameters
pos – The absolute starting position.
len – The length in bytes.
buf – The destination buffer.
- Returns
The number of bytes (
<= len
) stored atbuf
.Warning
buf
will not be NUL terminated.
-
char *text_bytes_alloc0(const Text*, size_t pos, size_t len)¶
Fetch text range into newly allocate memory region.
- Parameters
pos – The absolute starting position.
len – The length in bytes.
- Returns
A contiguous NUL terminated buffer holding the requested range, or
NULL
in error case.Warning
The returned pointer must be freed by the caller.
Iterator¶
An iterator points to a given text position and provides interfaces to
adjust said position or read the underlying byte value. Functions which
take a char
pointer will generally assign the byte value after
the iterator was updated.
-
struct Iterator¶
Iterator used to navigate the buffer content.
Captures the position within a Piece.
Note
Should be treated as an opaque type.
Warning
Any change to the Text will invalidate the iterator state.
Byte¶
Note
For a read attempt at EOF (i.e. text_size) an artificial NUL
byte which is not actually part of the file is returned.
Codepoint¶
These functions advance to the next/previous leading byte of an UTF-8
encoded Unicode codepoint by skipping over all continuation bytes of
the form 10xxxxxx
.
Grapheme Clusters¶
These functions advance to the next/previous grapheme cluster.
Note
The grapheme cluster boundaries are currently not implemented
according to UAX#29 rules.
Instead a base character followed by arbitrarily many combining
character as reported by wcwidth(3)
are skipped.
Lines¶
Translate between 1 based line numbers and 0 based byte offsets.
-
size_t text_pos_by_lineno(Text*, size_t lineno)¶
-
size_t text_lineno_by_pos(Text*, size_t pos)¶
History¶
Interfaces to the history graph.
-
bool text_snapshot(Text*)¶
Create a text snapshot, that is a vertex in the history graph.
-
size_t text_undo(Text*)¶
Revert to previous snapshot along the main branch.
Note
Takes an implicit snapshot.
- Returns
The position of the first change or
EPOS
, if already at the oldest state i.e. there was nothing to undo.
-
size_t text_redo(Text*)¶
Reapply an older change along the main branch.
Note
Takes an implicit snapshot.
- Returns
The position of the first change or
EPOS
, if already at the newest state i.e. there was nothing to redo.
-
size_t text_earlier(Text*)¶
-
size_t text_later(Text*)¶
-
size_t text_restore(Text*, time_t)¶
Restore the text to the state closest to the time given.
-
time_t text_state(const Text*)¶
Get creation time of current state.
Note
TODO: This is currently not the same as the time of the last snapshot.
Marks¶
A mark keeps track of a text position. Subsequent text changes will update all marks placed after the modification point. Reverting to an older text state will hide all affected marks, redoing the changes will restore them.
Warning
Due to an optimization cached modifications (i.e. no text_snapshot
was performed between setting the mark and issuing the changes) might
not adjust mark positions accurately.
-
typedef uintptr_t Mark¶
A mark.
-
EMARK¶
An invalid mark, lookup of which will yield
EPOS
.
Save¶
-
enum TextSaveMethod¶
Method used to save the text.
Values:
-
enumerator TEXT_SAVE_AUTO¶
Automatically chose best option.
-
enumerator TEXT_SAVE_ATOMIC¶
Save file atomically using
rename(2)
.Creates a temporary file, restores all important meta data, before moving it atomically to its final (possibly already existing) destination using
rename(2)
. For new files, permissions are set to0666 & ~umask
.Warning
This approach does not work if:
The file is a symbolic link.
The file is a hard link.
File ownership can not be preserved.
File group can not be preserved.
Directory permissions do not allow creation of a new file.
POSIX ACL can not be preserved (if enabled).
SELinux security context can not be preserved (if enabled).
-
enumerator TEXT_SAVE_INPLACE¶
Overwrite file in place.
Warning
I/O failure might cause data loss.
-
enumerator TEXT_SAVE_AUTO¶
-
bool text_save(Text*, const char *filename)¶
Save the whole text to the given file name.
Note
Equivalent to
text_save_method(filename, TEXT_SAVE_AUTO)
.
-
bool text_saveat(Text*, int dirfd, const char *filename)¶
-
bool text_save_method(Text*, const char *filename, enum TextSaveMethod)¶
Save the whole text to the given file name, using the specified method.
-
bool text_saveat_method(Text*, int dirfd, const char *filename, enum TextSaveMethod)¶
-
TextSave *text_save_begin(Text*, int dirfd, const char *filename, enum TextSaveMethod)¶
Setup a sequence of write operations.
The returned
TextSave
pointer can be used to write multiple, possibly non-contiguous, file ranges.Warning
For every call to
text_save_begin
there must be exactly one matching call to eithertext_save_commit
ortext_save_cancel
to release the underlying resources.
-
ssize_t text_save_write_range(TextSave*, const Filerange*)¶
Write file range.
- Returns
The number of bytes written or
-1
in case of an error.
-
bool text_save_commit(TextSave*)¶
Commit changes to disk.
- Returns
Whether changes have been saved.
Note
Releases the underlying resources and frees the given
TextSave
pointer which must no longer be used.
-
void text_save_cancel(TextSave*)¶
Abort a save operation.
Note
Does not guarantee to undo the previous writes (they might have been performed in-place). However, it releases the underlying resources and frees the given
TextSave
pointer which must no longer be used.
-
ssize_t text_write(const Text*, int fd)¶
Write whole text content to file descriptor.
- Returns
The number of bytes written or
-1
in case of an error.
-
ssize_t text_write_range(const Text*, const Filerange*, int fd)¶
Write file range to file descriptor.
- Returns
The number of bytes written or
-1
in case of an error.
Miscellaneous¶
-
bool text_mmaped(const Text*, const char *ptr)¶
Check whether
ptr
is part of a memory mapped region associated with this text instance.