textmate

package module

v0.1.0 Latest Latest Go to latest Published: Sep 22, 2025 License: Zlib Imports: 13 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/friedelschoen/go-textmate

Links

Open Source Insights

README ¶

go-textmate

A Go implementation of a TextMate grammar tokenizer. It can load .tmLanguage.json grammars, compile them into an internal rule tree, and tokenize source text into scoped tokens. This is useful for syntax highlighting or code analysis.

Features

Load and compile TextMate grammars (.tmLanguage.json)
Support for:
- match, begin/end blocks
- captures, beginCaptures, endCaptures
- include (#repo, $self, source.*)
Tokenizer with proper stack-based push/pop rules
Tokens carry:
- Scope (TextMate scope name)
- Start and Length
- Depth (nesting depth, for overlapping tokens)
Mapper utility to iterate over tokens efficiently
Written in idiomatic Go, no C dependencies

Installation

Requisite

Oniguruma Regular Expression Library

% go get github.com/friedelschoen/go-textmate

Usage

Load a grammar

grammar, err := textmate.LoadGrammar("grammars/go.tmLanguage.json")
if err != nil {
    panic(err)
}

Tokenize text

f, _ := os.Open("example.go")
tokens, err := grammar.TokenizeReader(f)
if err != nil {
    panic(err)
}
for _, tok := range tokens {
    fmt.Printf("%s: %d..%d\n", tok.Scope, tok.Start, tok.End())
}

Using Mapper

mapper := make(textmate.Mapper, fileSize)
for _, tok := range tokens {
    mapper.Add(tok)
}
for pos, scopes := range mapper.Iter() {
    fmt.Println(pos, scopes)
}

License

Zlib License.

Documentation ¶

Overview ¶

Package textmate tokenizes source files using TextMate grammars, intended for syntax highlighting. Workflow: 1) Parse JSON grammar into an internal rule tree (MatchRule) 2) Tokenizer walks the rules and emits scoped tokens

Index ¶

Variables
func CompareToken(left *Token, right *Token) int
type Grammar
- func CompileGrammar(j GrammarJSON, dirname string, filename string) (*Grammar, error)
- func LoadGrammar(pathname string) (*Grammar, error)
- func (g *Grammar) StackItem() *StackItem
- func (g *Grammar) TokenizeReader(reader io.Reader) ([]*Token, error)
type GrammarJSON
type Mapper
- func (tm Mapper) Add(tok *Token)
- func (tm Mapper) Iter() iter.Seq2[int, []*Token]
type RuleJSON
type StackItem
- func TokenizeLine(offset int, text string, start int, end int, top *StackItem, ...) (*StackItem, error)
- func (si *StackItem) Depth() int
- func (si *StackItem) Root() *Grammar
type Token
- func (tok Token) End() int

Constants ¶

This section is empty.

Variables ¶

View Source

var (
	ErrScopeName = errors.New("unexpected `scopeName`")
)

View Source

var GrammarExtension = ".tmLanguage.json"

GrammarExtension is the expected extension for grammar files (used for "source.*" includes).

Functions ¶

func CompareToken ¶

func CompareToken(left *Token, right *Token) int

Types ¶

type Grammar ¶

type Grammar struct {
	// contains filtered or unexported fields
}

Grammar is the compiled grammar with precompiled regexes and an executable rule tree.

func CompileGrammar ¶

func CompileGrammar(j GrammarJSON, dirname string, filename string) (*Grammar, error)

CompileGrammar compiles a decoded GrammarJSON into an executable Grammar. dirname decides where 'source.*' includes are resolved and defaults to `.`; filename is used to strictly validate j.ScopeName ("source.<basename>") and may be omitted.

func LoadGrammar ¶

func LoadGrammar(pathname string) (*Grammar, error)

LoadGrammar reads a *.tmLanguage.json, validates scopeName vs filename, and compiles it into a usable Grammar.

func (*Grammar) StackItem ¶

func (g *Grammar) StackItem() *StackItem

StackItem constructs a root frame for this grammar.

func (*Grammar) TokenizeReader ¶

func (g *Grammar) TokenizeReader(reader io.Reader) ([]*Token, error)

TokenizeReader is a reference implementation that scans line-by-line. Offsets are global across lines; tokens are stabilized afterwards using CompareToken.

type GrammarJSON ¶

type GrammarJSON struct {
	ScopeName    string              `json:"scopeName" plist:"scopeName"`
	FileTypes    []string            `json:"fileTypes" plist:"fileTypes"`
	FoldingStart string              `json:"foldingStartMarker" plist:"foldingStartMarker"`
	FoldingEnd   string              `json:"foldingStopMarker" plist:"foldingStopMarker"`
	FirstLine    string              `json:"firstLineMatch" plist:"firstLineMatch"`
	Repository   map[string]RuleJSON `json:"repository" plist:"repository"`
	Patterns     []RuleJSON          `json:"patterns" plist:"patterns"`
}

GrammarJSON mirrors the (subset of) TextMate JSON/Plist grammar on disk. It is decoded as-is and later compiled into Grammar.

type Mapper ¶

type Mapper [][]*Token

Mapper is an index→tokens structure. For each byte position, it stores the tokens covering that position. Useful for renderers that draw only when the set of active tokens changes.

func (Mapper) Add ¶

func (tm Mapper) Add(tok *Token)

Add inserts the token for all positions it covers. Empty scopes are ignored. Note: O(tok.Length); can be expensive for very long tokens.

func (Mapper) Iter ¶

func (tm Mapper) Iter() iter.Seq2[int, []*Token]

Iter returns an iterator yielding (pos, tokens) whenever the set of tokens changes. Tokens at each position are stabilized via CompareToken for deterministic order.

type RuleJSON ¶

type RuleJSON struct {
	Name          string              `json:"name" plist:"name"`
	Match         string              `json:"match" plist:"match"`
	Begin         string              `json:"begin" plist:"begin"`
	End           string              `json:"end" plist:"end"`
	Patterns      []RuleJSON          `json:"patterns" plist:"patterns"`
	Captures      map[string]RuleJSON `json:"captures" plist:"captures"`
	BeginCaptures map[string]RuleJSON `json:"beginCaptures" plist:"beginCaptures"`
	EndCaptures   map[string]RuleJSON `json:"endCaptures" plist:"endCaptures"`
	Include       string              `json:"include" plist:"include"`
}

RuleJSON is a raw grammar rule (as found in the JSON file). Note: capture groups are addressed by string indices "1","2",...

type StackItem ¶

type StackItem struct {
	// contains filtered or unexported fields
}

StackItem is one frame on the parse stack carrying the active rule context.

func TokenizeLine ¶

func TokenizeLine(offset int, text string, start int, end int, top *StackItem, yield func(*Token)) (*StackItem, error)

TokenizeLine tokenizes text[start:end] within the given stack context. Always guarantees progress: if nothing matches, emits a 1-byte filler token (Scope:"").

func (*StackItem) Depth ¶

func (si *StackItem) Depth() int

Depth returns the nesting depth of this frame (used for token priority).

func (*StackItem) Root ¶

func (si *StackItem) Root() *Grammar

Root walks up to the nearest non-nil grammar on the stack. Panics if none is found (should not happen).

type Token ¶

type Token struct {
	// Scope given by grammar
	Scope string
	// Index in text of start
	Start int
	// Length of the token
	Length int
	// Depth, if tokens overlap each other, the token with a higher depth should be used
	Depth int
}

Token describes a scoped span in the input. Tokens may overlap; render the token with the highest Depth at a position.

func (Token) End ¶

func (tok Token) End() int

Source Files ¶

View all Source files

Directories ¶

Path	Synopsis
regexp Package regexp implements a regular expression library using Oniguruma	Package regexp implements a regular expression library using Oniguruma

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL