Skip to main content

Claude Code Skills and Metadata and Prompts and Automating Statistics

·974 words
Miles Wallace
Author
Miles Wallace

Prompts and Automating Statistics
#

A Claude Code skill is a reusable unit of behavior that the model can invoke during a session. Every skill is defined by three pieces of metadata and a name and a description and a trigger condition that tells the model when to use it. That metadata is not decorative. It functions as a compressed prompt and a standing instruction that shapes model behavior before any user message arrives. When the model reads a skill’s description it is effectively reading a directive and this is what I do and this is when I do it and this is what the result should look like.

A traditional prompt is written once and passed at inference time. Skill metadata is written once and loaded at session start but it operates on the same principle and natural language instructions that constrain and direct model outputs. The difference is scope. A prompt governs a single exchange. Skill metadata governs a category of exchanges and every invocation of that skill across every session that loads it.

This means the quality of a skill’s description has compounding returns. A vague description produces inconsistent behavior across invocations. A precise description produces behavior that is reproducible enough to trust in automation. The practical standard for a well-written skill description is that a developer reading it with no other context could predict within narrow bounds what the model will do when triggered.

Three properties distinguish effective skill metadata from ineffective:

Specificity of scope. “Help with tasks” is not a trigger condition. “Use this when the user mentions a CSV file and asks for summary statistics or trend analysis” is. The model needs enough signal to distinguish this skill from every other loaded skill.

Output contract. The description should state the expected form of the output and a report and a code block and a list and a diff so the model does not have to infer it from context each time.

Negative space. The best descriptions include at least one sentence about what the skill does not do. “This skill generates analysis only and it does not modify files or run commands” eliminates an entire class of unintended actions.

The clearest case for skill automation is work that is repetitive and low-variance and time-consuming in aggregate. Consider a developer who starts each morning by checking three things and unmerged branches older than seven days and open pull requests with no reviewer assigned and failing CI jobs on the main branch. Each check takes two to four minutes manually. Combined they consume roughly fifteen minutes every morning or roughly sixty hours per year.

The model reads this description and maps it to the morning-check invocation and executes the three checks in sequence without further instruction. The time cost drops from fifteen minutes to under thirty seconds. Across a fifty-week working year that skill recovers roughly fifty-seven hours. The description took twelve minutes to write.

The pattern generalizes to any task with these properties and fixed inputs and fixed outputs and no meaningful decision-making required from the human. Filing expense categories and rotating log archives and generating weekly standup summaries and updating a changelog from commit messages are all skill candidates. The selection criterion is not complexity. It is repetition.

The second category is higher-value and tasks where the human could do the work but the iteration cost is high enough that they often skip it. Statistical analysis of a new dataset is a canonical example. Loading a file and profiling its shape and checking distributions and testing for outliers and producing a summary typically involves thirty to sixty lines of boilerplate Python before any real analysis begins. Most of that code is identical across datasets.

When invoked the model produces a complete and runnable script. A representative output for a sales dataset looks like this:

    import pandas as pd
    from scipy import stats

    df = pd.read_csv("sales_data.csv")

    print("=== Shape ===")
    print(df.shape)

    print("\n=== Types & Nulls ===")
    summary = pd.DataFrame({
        "dtype": df.dtypes,
        "nulls": df.isnull().sum(),
        "null_pct": (df.isnull().sum() / len(df) * 100).round(2)
    })
    print(summary)

    print("\n=== Descriptive Stats ===")
    print(df.describe().T.assign(
        skew=df.skew(numeric_only=True),
        kurtosis=df.kurtosis(numeric_only=True),
        iqr=df.select_dtypes("number").quantile(0.75) - df.select_dtypes("number").quantile(0.25)
    ))

    print("\n=== Categorical Value Counts ===")
    for col in df.select_dtypes("object").columns:
        if df[col].nunique() < 50:
            print(f"\n{col}:")
            print(df[col].value_counts())

    print("\n=== Correlation Matrix ===")
    print(df.select_dtypes("number").corr().round(3))

    print("\n=== Outlier Counts (Z-score > 3) ===")
    numeric = df.select_dtypes("number")
    z_scores = numeric.apply(stats.zscore, nan_policy="omit").abs()
    outliers = (z_scores > 3).sum()
    print(outliers[outliers > 0])

The script is not novel. Any competent data analyst could write it. The value of the skill is not that it produces code humans cannot and it is that it produces that code in three seconds instead of eight minutes and with no copy-pasting from a previous project and no hunting for the scipy import path and no remembering whether kurtosis is a DataFrame method or requires a separate call.

Skills do not produce linear returns. The first skill a developer writes saves a fixed amount of time. Each subsequent skill reduces the friction of writing the next one because the pattern and tight description and clear output contract and explicit trigger condition becomes habitual. By the time a developer has written ten skills the marginal cost of writing the eleventh is low enough that tasks which would have been done manually once a week become automation candidates.

The aggregate effect on a development workflow is measurable. Studies of developer tooling adoption consistently find that the highest-leverage interventions are not the ones that make hard tasks faster and they are the ones that eliminate the decision cost of whether to automate at all. A skill system with good metadata lowers that decision cost close to zero. The question shifts from “is this worth automating?” to “do I already have a skill for this?” That shift small as it sounds is the mechanism through which individual productivity improvements compound into structural changes in how work gets done.