Creating a Database with AI Skills ======================================== `astrodb_bot `_ provides a set of **AI skills** that guide an assistant (Claude, Cursor, etc.) through building a new database from a raw data table: parsing the table, mapping its columns to the :doc:`AstroDB template schema <../getting_started/template_schema/template_schema>`, generating a Felis ``schema.yaml``, and will creating a populated ``DatabaseName.sqlite``. **Real catalogs are messy.** inconsistent types, missing values, and column names that don't match the schema. Handling that is becomes much easier when you have an AI which has the skills installed. These skills automate the manual workflow described elsewhere in this section (:doc:`make_new_db/index` and :doc:`modifying/index`). .. note:: The skills require an AI **skill runner**: an AI that reads a ``skills/`` directory such as ``.claude/skills/``, ``.cursor/skills/``, or ``.agents/skills/``. Installation ------------ Copy the ``skills/`` directory from the `astrodb_bot repository `_ into the location your AI reads skills from. For example, with Claude: .. code-block:: bash git clone https://github.com/astrodbtoolkit/astrodb_bot.git mkdir -p .claude/skills cp -r astrodb_bot/skills/* .claude/skills/ Requirements ~~~~~~~~~~~~~ * Python 3.11 or greater * ``uv`` or ``pip`` to install Python packages * ``astropy``, ``pandas``, ``lsst-felis``, ``astrodbkit``, and ``astrodb_utils`` The skills ---------- The skills are designed to run in sequence, each feeding the next, but any of them can also be run on its own. Each one links to its full definition in the ``astrodb_bot`` repository. #. `astrodb-setup `_ — Sets up the environment for building a new database. Has the user clone the template repository and walks them through naming their database. #. `astrodb-parse-data-table `_ — Reads a data table (FITS, CSV, ECSV, HDF5, VOTable, Parquet, Excel, ...) and summarizes every column's name, description, units, and type as a Markdown and HTML report. #. `astrodb-match-schema `_ — Maps each parsed column to a table and field in the AstroDB template schema, assigning a confidence level to every match and flagging anything it cannot place. #. `astrodb-validate-schema-mapping `_ — Checks the proposed mapping against the actual data: null values landing in non-nullable fields, and type mismatches between the data and the schema. #. `astrodb-generate-schema `_ — Turns the validated mapping into a Felis-format ``schema.yaml`` (see :doc:`modifying/yaml`) and runs ``felis validate`` on it. #. `astrodb-create-db `_ — Creates an empty SQLite database from the validated ``schema.yaml``, following the `astrodb-template-db `_ file layout, and generates a matching test suite. #. `astrodb-ingest-publication `_ — Generates and runs a script that adds publications (references/citations) to the ``Publications`` lookup table using ``astrodb_utils.publications.ingest_publication``. Handles a single paper, a batch from a data file's reference column, or backfilling existing rows with missing metadata. Every reference used elsewhere in the database must exist here first. See also :doc:`../db_access/ingesting/ingesting_publications`. #. `astrodb-ingest-source `_ — Generates and runs a script that ingests sources from the data table into the new database using ``astrodb_utils.sources.ingest_source``. See also :doc:`../db_access/ingesting/getting_started_ingesting`. Intermediate artifacts — the parsed-column report, the schema mapping, the generated ``schema.yaml``, and the ingest scripts — are written to a ``tmp/`` folder, so they don't clutter your project and you can inspect each step. Example and Prompt Advice ------------------------- **We recommend starting in plan mode (Claude, Cursor, and Codex all have /plan.)** The example prompt given was: *Review your astro-db skills and create a plan to have a fully working database after going through* ``@NearbyGalaxies_Jan2021_PUBLIC.fits`` Plan mode tells the AI inspect the input FITS and propose a complete build plan using all of the available skills. The output of this prompt was a populated ``LocalGroupDB.sqlite``. Alternatively, you can also invoke the skills one at a time. Advice for working with Claude ------------------------------ * **Give the AI the template as a reference.** Point it at the `astrodb-template-db `_ repository, which contains example ``schema.yaml`` files and test suites for every template table. This helps the AI structure the new database and its tests. * **Keep track of token usage.** The more tokens you use, the more expensive it is. Using a better model, an advisor AI, and higher effort settings will improve the result but also increase the cost.