textmate

package module
v0.1.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Sep 22, 2025 License: Zlib Imports: 13 Imported by: 0

README

go-textmate

A Go implementation of a TextMate grammar tokenizer. It can load .tmLanguage.json grammars, compile them into an internal rule tree, and tokenize source text into scoped tokens. This is useful for syntax highlighting or code analysis.

Features

  • Load and compile TextMate grammars (.tmLanguage.json)
  • Support for:
    • match, begin/end blocks
    • captures, beginCaptures, endCaptures
    • include (#repo, $self, source.*)
  • Tokenizer with proper stack-based push/pop rules
  • Tokens carry:
    • Scope (TextMate scope name)
    • Start and Length
    • Depth (nesting depth, for overlapping tokens)
  • Mapper utility to iterate over tokens efficiently
  • Written in idiomatic Go, no C dependencies

Installation

Requisite
% go get github.com/friedelschoen/go-textmate

Usage

Load a grammar
grammar, err := textmate.LoadGrammar("grammars/go.tmLanguage.json")
if err != nil {
    panic(err)
}
Tokenize text
f, _ := os.Open("example.go")
tokens, err := grammar.TokenizeReader(f)
if err != nil {
    panic(err)
}
for _, tok := range tokens {
    fmt.Printf("%s: %d..%d\n", tok.Scope, tok.Start, tok.End())
}
Using Mapper
mapper := make(textmate.Mapper, fileSize)
for _, tok := range tokens {
    mapper.Add(tok)
}
for pos, scopes := range mapper.Iter() {
    fmt.Println(pos, scopes)
}

License

Zlib License.

Documentation

Overview

Package textmate tokenizes source files using TextMate grammars, intended for syntax highlighting. Workflow: 1) Parse JSON grammar into an internal rule tree (MatchRule) 2) Tokenizer walks the rules and emits scoped tokens

Index

Constants

This section is empty.

Variables

View Source
var (
	ErrScopeName = errors.New("unexpected `scopeName`")
)
View Source
var GrammarExtension = ".tmLanguage.json"

GrammarExtension is the expected extension for grammar files (used for "source.*" includes).

Functions

func CompareToken

func CompareToken(left *Token, right *Token) int

Types

type Grammar

type Grammar struct {
	// contains filtered or unexported fields
}

Grammar is the compiled grammar with precompiled regexes and an executable rule tree.

func CompileGrammar

func CompileGrammar(j GrammarJSON, dirname string, filename string) (*Grammar, error)

CompileGrammar compiles a decoded GrammarJSON into an executable Grammar. dirname decides where 'source.*' includes are resolved and defaults to `.`; filename is used to strictly validate j.ScopeName ("source.<basename>") and may be omitted.

func LoadGrammar

func LoadGrammar(pathname string) (*Grammar, error)

LoadGrammar reads a *.tmLanguage.json, validates scopeName vs filename, and compiles it into a usable Grammar.

func (*Grammar) StackItem

func (g *Grammar) StackItem() *StackItem

StackItem constructs a root frame for this grammar.

func (*Grammar) TokenizeReader

func (g *Grammar) TokenizeReader(reader io.Reader) ([]*Token, error)

TokenizeReader is a reference implementation that scans line-by-line. Offsets are global across lines; tokens are stabilized afterwards using CompareToken.

type GrammarJSON

type GrammarJSON struct {
	ScopeName    string              `json:"scopeName" plist:"scopeName"`
	FileTypes    []string            `json:"fileTypes" plist:"fileTypes"`
	FoldingStart string              `json:"foldingStartMarker" plist:"foldingStartMarker"`
	FoldingEnd   string              `json:"foldingStopMarker" plist:"foldingStopMarker"`
	FirstLine    string              `json:"firstLineMatch" plist:"firstLineMatch"`
	Repository   map[string]RuleJSON `json:"repository" plist:"repository"`
	Patterns     []RuleJSON          `json:"patterns" plist:"patterns"`
}

GrammarJSON mirrors the (subset of) TextMate JSON/Plist grammar on disk. It is decoded as-is and later compiled into Grammar.

type Mapper

type Mapper [][]*Token

Mapper is an index→tokens structure. For each byte position, it stores the tokens covering that position. Useful for renderers that draw only when the set of active tokens changes.

func (Mapper) Add

func (tm Mapper) Add(tok *Token)

Add inserts the token for all positions it covers. Empty scopes are ignored. Note: O(tok.Length); can be expensive for very long tokens.

func (Mapper) Iter

func (tm Mapper) Iter() iter.Seq2[int, []*Token]

Iter returns an iterator yielding (pos, tokens) whenever the set of tokens changes. Tokens at each position are stabilized via CompareToken for deterministic order.

type RuleJSON

type RuleJSON struct {
	Name          string              `json:"name" plist:"name"`
	Match         string              `json:"match" plist:"match"`
	Begin         string              `json:"begin" plist:"begin"`
	End           string              `json:"end" plist:"end"`
	Patterns      []RuleJSON          `json:"patterns" plist:"patterns"`
	Captures      map[string]RuleJSON `json:"captures" plist:"captures"`
	BeginCaptures map[string]RuleJSON `json:"beginCaptures" plist:"beginCaptures"`
	EndCaptures   map[string]RuleJSON `json:"endCaptures" plist:"endCaptures"`
	Include       string              `json:"include" plist:"include"`
}

RuleJSON is a raw grammar rule (as found in the JSON file). Note: capture groups are addressed by string indices "1","2",...

type StackItem

type StackItem struct {
	// contains filtered or unexported fields
}

StackItem is one frame on the parse stack carrying the active rule context.

func TokenizeLine

func TokenizeLine(offset int, text string, start int, end int, top *StackItem, yield func(*Token)) (*StackItem, error)

TokenizeLine tokenizes text[start:end] within the given stack context. Always guarantees progress: if nothing matches, emits a 1-byte filler token (Scope:"").

func (*StackItem) Depth

func (si *StackItem) Depth() int

Depth returns the nesting depth of this frame (used for token priority).

func (*StackItem) Root

func (si *StackItem) Root() *Grammar

Root walks up to the nearest non-nil grammar on the stack. Panics if none is found (should not happen).

type Token

type Token struct {
	// Scope given by grammar
	Scope string
	// Index in text of start
	Start int
	// Length of the token
	Length int
	// Depth, if tokens overlap each other, the token with a higher depth should be used
	Depth int
}

Token describes a scoped span in the input. Tokens may overlap; render the token with the highest Depth at a position.

func (Token) End

func (tok Token) End() int

Directories

Path Synopsis
Package regexp implements a regular expression library using Oniguruma
Package regexp implements a regular expression library using Oniguruma

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL