Laravel MCP for ChatGPT Apps: Lessons from a Real App cover image

Laravel MCP for ChatGPT Apps: Lessons from a Real App

Tom Oehlrich

Laravel MCP ChatGPT Apps

I recently tried to turn an existing Laravel app into a ChatGPT app using MCP.

The app runs at aipricingcomparison.com and compares LLM pricing based on input and output tokens. The business logic was already there. The goal was simply to expose it inside ChatGPT.

This ChatGPT app is currently not publicly available. It's working in developer mode, but still a work in progress. Next steps are improving the design, adding more tools, and preparing it for release. If you're interested in how this evolves, check back here - I'll keep writing about this as it progresses.


The Starting Point

On the Laravel side, things were already in good shape. The MCP server worked as expected in MCP Inspector and MCPJam - tools were callable, responses were correct, no major issues.

So I assumed connecting this to ChatGPT would be a relatively small step.

That assumption was wrong.


Where Things Started to Break

As soon as I tried to create a connector in ChatGPT, issues appeared. Connector creation failed without meaningful error messages. Tool calls were inconsistent. Sometimes requests went through, sometimes not.

One important thing I realized early on: MCP working in isolation does not guarantee it will work inside ChatGPT. ChatGPT adds its own layer - stricter validation, tighter timing constraints, and connector state that isn't always transparent. At some point it stops being a code problem and becomes more about understanding how the whole system behaves.


No Single Fix

I want to be clear about this: there was no single change that made everything work. The system only became stable after a number of smaller adjustments, and because ChatGPT doesn't give you much feedback, it's hard to isolate root causes precisely. Everything that follows is a set of changes that together led to a reliable setup - not a list of "the fix."


Removing Auth First

I initially had OAuth / Passport in place. That adds a lot of moving parts - routes, middleware, token handling - and when something fails, you don't know whether the issue is MCP itself, the ChatGPT connector, or the auth layer.

So I removed OAuth completely for the first iteration and exposed the MCP endpoint without authentication. This reduced complexity significantly and made it much easier to reason about what was actually going wrong. Auth can come back once the core integration is stable.


Keeping the Endpoint Simple

The MCP endpoint itself is intentionally minimal - a single /mcp route with a standard POST setup. I also verified the route explicitly with:

php artisan route:list --path=mcp -v

It sounds trivial, but having a predictable and clearly defined endpoint eliminates an entire class of potential issues.


Adding Logging Early

One of the most useful changes was adding dedicated middleware to log incoming requests, outgoing responses, and execution time. Without this, debugging becomes guesswork - especially with ChatGPT in the loop where you get very little visibility into what's happening during the connector handshake or tool invocation.


Reducing to One Tool

Originally the MCP server included multiple tools. More tools mean more variables when debugging, so I reduced it to a single focused tool: comparing text-based pricing across models. This made it much easier to verify schema correctness and observe consistent behavior.


Fixing Schema Issues

There were also smaller issues at the tool level. Schema definitions that worked locally caused errors in ChatGPT. Certain parameter styles weren't accepted. Adjusting to a simpler, more standard structure resolved these - ChatGPT appears to be stricter about schema than local MCP tooling.


Performance Matters More Than Expected

Initially, parts of the ranking logic were implemented in PHP after fetching data from the database. That worked locally, but under ChatGPT the response times were longer and requests occasionally failed or timed out.

Moving the core calculations into SQL made a noticeable difference. Results are now pre-calculated in the query, with sorting and limiting happening at the database level. Response times dropped and reliability improved.


Connector State Can Be Misleading

At one point I kept seeing errors like "OAuth client not found" - even though OAuth had already been removed. The issue wasn't the code. It was stale connector state inside ChatGPT.

Deleting and recreating the connector fixed it.

This is worth keeping in mind before spending too much time debugging the application itself. The connector state in ChatGPT can become inconsistent and may not reflect what your backend is actually doing.


Adding a UI with MCP Resources

Once tool execution was stable, I added a UI layer using an MCP resource. This involved registering a resource with a ui:// URI, returning HTML with the text/html+skybridge MIME type, and connecting the tool output to the UI via metadata.

On the backend everything looked correct - data was present, structure was valid. But the UI wasn't rendering the expected values.


How Widget Data Actually Works

The official model is straightforward: the tool returns structuredContent, and the widget renders it. In practice it works more like this:

  1. Widget loads and does an initial render (no data yet)
  2. ChatGPT injects data into window.openai
  3. The widget needs to re-read and re-render

This explains a lot of confusing behavior. On top of that, there isn't a single guaranteed data source - depending on timing and context, the data can appear in _meta.openai/outputTemplate, window.openai.props, toolOutput.structuredContent, or other variations. The widget needs to check multiple possible sources and use the first valid one.


The "0 / 0" Problem

My widget shows a summary line like "Top matches for X input and Y output tokens." These values come from the user's request. But even though the backend sends the correct data, the widget initially renders "0 / 0" - because it renders immediately on load, before ChatGPT has injected the actual data.

A few patterns helped make this more reliable. Rather than rendering fallback values like 0, I treat the first render as a loading state. The widget checks multiple data sources, retries rendering for a short period, and preserves the last valid data so that temporary empty states don't overwrite values that are already there.


Final Setup

The working setup is fairly simple:

Getting there meant removing complexity rather than adding it.


What Comes Next

With a stable foundation in place, the next steps are reintroducing authentication in a controlled way, expanding the toolset, and improving the UI components.


If you're working on a similar setup and run into issues, feel free to reach out.