AI Software Development: How to Build Prod-Level App Using AI

There has been considerable talk about AI software development. Will AI replace human engineers? Can startups make production-level software with AI? Beyond the hype and fear-mongering, there’s very little empirical evidence of its use on commercial projects.

At MindK, we tried to fix this by outsourcing as many tasks as possible to our (not–so–trusty) AI assistant and watching it fail or succeed. This article is an honest overview of this experience and a step-by-step guide on how to build apps using AI.

But first, a little background. Early in 2024, I consulted a Valley startup that wanted to build a mobile app for electricians, plumbers, and other technicians. According to the founders, these people could do ~25% of the work remotely, including easy fixes and preliminary estimations. However, they couldn’t charge for these services. This is what their app would fix.

As a Lead Front-end Developer, I took part in project estimation. The app required a unique combination of integrations (Stripe, Daily.co, Twilio) and lots of custom logic. This raised the costs to as much as $200,000 with an in-house team which was simply too much.

Our team has previously experimented with AI software development on internal projects. We suggested using some of that experience to save costs. The idea was to have one Senior Engineer create the entire app using AI for everything from design to deployment.

Here’s what I learned when developing the Service Call app.

AI in software engineering: understanding the capabilities

Our first stop is to understand the real AI capabilities. The table below list typical project tasks that AI proponents claim you can safely outsource to an artificial friend.

As a natural skeptic, I ran a survey across the departments to collect feedback about the past six months of AI usage. The result is color-coded to reflect the current generative AI capabilities:

Green: incredibly helpful.
Yellow: helpful with some caveats.
Red: don’t waste your time.

Keep in mind that the real mileage you get out of AI depends heavily on your technical skills, domain knowledge, and project specifics.

This means that an engineer’s mind is still the key factor in a product’s eventual success.

Development	QA	DevOps
Refactoring	Creating test cases	Generation of Docker files
Debugging	Creating tests with Playwright	Troubleshooting CI/CD issues
Code generation	Analysis of test coverage	Writing bash scripts
Documentation generation	Prioritization of test scenarios	Automation of infrastructure monitoring
Optimization of algorithms	Test data generation	Optimization of server configurations
Code analysis for vulnerabilities	Automation of regression testing	Log analysis to identify performance issues
Code auto-completion	Log analysis to detect anomalies	Generating rules for firewalls
Generation of unit tests	Generation of test reports	Automation of deployment and scaling
Porting to a new programming language	Creation of load testing scenarios	Creation of disaster recovery scenarios
Explanation of complex parts of the code	Analysis of A/B testing results	Optimization of caching and load balancing
Project Management	Business Analysis	Design
Analysis of email text for tone coloration	Requirements analysis	Creation of design ideas
Checking texts with technical context	Generation of business cases	Generate HTML/CSS from design descriptions
Project risk assessment	Immersion in the client’s business area/context	Creation of color palettes
Project schedule generation	Creating a summary of large amounts of data	Generation of icons and logos
Analysis of team efficiency	Forecasting market trends	Optimization of design for different devices
Creating presentations for stakeholders	Competitor analysis	Analysis of design accessibility (AAA)
Forecasting project completion dates	Generation of business models	Creation of interface prototypes
Optimization of resource allocation	Optimization of business processes	Generating variants of font compositions
Automation of project status reporting	Analysis of user needs	Analysis of design trends
Analysis of team communications		Creation of animations for interfaces

Over nine weeks on the Service Call project, I noted all the AI quirks and optimized the prompting approach. There were mistakes I’d rather avoid and some major disappointments. However, the project was a great success overall. According to my calculations, using AI saved the client ~$160,000 USD.

Before going through a step-by-step process of making software with AI, let’s sum up the pros, cons, and use cases for artificial intelligence in software development.

Pros of AI for software development

The latest Google research shows that AI adoption has many positive impacts on developers. They include higher productivity, better code & documentation quality, and lower burnout. Here are some of the

AI can create code from almost nothing. Ask AI to describe business logic. Then request adjustments, and re-generate the code.
Follows proven best practices in software development. This ensures a high-quality approach, even if you don’t specify the best prctices.
Great for debugging, error resolving (suggestions, summaries, insights), and codeless test automation.
Works well for refactoring. Provide the code and request the necessary refactoring actions (optimization, readability, maintainability, or error handling)ю
Good at refining and optimizing the code. You can give it a code example or link an article describing a suitable approach.
Suitable for simple integrations and basic component code.
Excellent for generating unit, integration, and end-to-end (E2E) tests. Сomplex scenarios—like advanced mocking, intricate API interactions, or performance optimizations—require detailed and specific prompts to achieve accurate and effective test coverage.

AI in software development
Infogram

Cons of AI for software development

The latest Google research shows that a 25% increase in AI adoption reduces delivery stability by 7.2%. This is likely due to developers making larger changes with less effort.

Code quality is generally lower than what’s expected of skilled developers.
Provides rather basic settings (eslint/prettier/husky). You need to manually adjust or maintain a configuration preset to ensure consistency with your team’s coding standards.
Has troubles with project files and folder structure. Can suggest slightly different file locations and structures from the one used at the beginning.
May have outdated knowledge. The API of the service you’re trying to integrate could already have breaking changes and updates. So, you might spend hours debugging rather than changing the API version.
Trying to fix the incorrect spelling by linking the latest version of the package results in temporary improvements. However, AI simply abandons this knowledge at some point and writes according to the old version that exists in its data model. For example, I had to manually correct the props with MUI Data Grid.
Often writes tests in a way for them to pass. However, you’re free to select only those suggestions that will bring value.
Sometimes enters a loop and stops answering. The best solution in this case is to start a new conversation.

To avoid generic output, you need to understand the desired result enough to describe it. Provide details, set rules, and guide AI code assistants to correct answers.

How to make production-level software with AI?

Now let’s look at the process I’d use to create the Service Call app if I had a time machine to tell me about all the dead ends, missed opportunities, and sleepless nights.

#1 Select the best AI software development tools

Now, we can select the best assistant to make software with AI. For this purpose, we aggregated all available performance information for the most popular AI tools.

Llama 3.1 (Meta)

Pros: high efficiency in code generation. Multilingual text processing, math problem-solving, synthetic data generation for training smaller models.

Cons: not plug-and-play and requires configuration. This takes up too much time for smaller projects.

ChatGPT 4-o (OpenAI)

Pros: code generation, explanation of code and concepts, debugging, guidance for API integration. GPT-4o has improved contextual understanding and supports text, audio, image, and video processing.

Cons: knowledge is limited to September 2022 for GPT-3.5 and April 2023 for GPT-4.

Claude 3.5 Sonnet (Anthropic)

Pros: advanced code generation and analysis. Working with long contexts, explanation of complex concepts.

Cons: can’t access the Internet or navigate links. All information must be copied directly to the chat.

GitHub Copilot

Pros: code suggestions in the IDE, function generation from comments, context-sensitive completions.

Cons: an advanced autocomplete, not a full-blown coding assistant with the possibility to upload files.

Here’s how these AI tools stack up to each other in a variety of benchmarks for coding, reasoning, and math.

Legend:

MMLU (Massive Multitask Language Understanding): evaluates the ability to handle a wide range of academic tasks without training on specific tasks.

MMLU PRO: a variant of the MMLU benchmark that requires a five-step chain of thought reasoning.

IFEval: identifying and extracting specific information from a text.

Code:

HumanEval: generation of correct code.

MBPP EvalPlus: generation of correct code, with an emphasis on basic (0-shot) scoring.

Math:

GSM8K: assessment of math problem-solving skills.

MATH: general mathematical ability.

Reasoning:

ARC Challenge: challenging multiple-choice questions designed to test deep reasoning skills.

GPOQA: assesses reasoning through a series of chain-of-thought (CoT) questions.

Tool usage:

BFCL: evaluates the ability to interact with tools and APIs to retrieve and manipulate data.

Nexus: measures the ability to integrate and effectively use different tools.

Long context:

ZeroSCROLLS/QuaLITY: tests the ability to understand and generate answers based on long contexts.

InfiniteBench/En.MC: evaluates multiple-choice questions in long contexts.

NIH/Multi-needle: measures the ability to solve multiple problems in long contexts.

Multilingual:

Multilingual MGSM: measures the performance in different languages using the Massive Global Language Model Benchmark (MGSM).

Claude 3.5 outperforms competitors in most relevant benchmarks. However, the ability to navigate links is more important for me than the marginal improvement in coding abilities. Both Claude 3.5 Sonnet and GPT-4o didn’t exist at the start of the project, so we had to rely on GPT 4.

Artificial intelligence software development is an extremely fast-paced niche. It’s always a good idea to stay on the lookout for new models and tools. We’re currently experimenting with ChatGPT o1, cursor.ai і v0 code editor, and v0 by Vercel and will provide updates later.

#2 Customize your AI assistant

After choosing the best AI coding assistants for your project, it’s time to tinker with their brains. OpenAI allows you to Customize ChatGPT with up to 1500 characters of custom instructions:

What would you like ChatGPT to know about you to provide better responses?

I found it useful to share your honest experience with software development and specific technologies. AI tools offer much more detailed and technical information if you call yourself a Software Engineer rather than a Project Manager.

For example, I am a senior full-stack developer specializing in React, Node.js, NestJS, TypeScript, and Postgres. I often work with complex React components that require advanced performance optimizations, state management, and real-time data handling (e.g. polling). My work includes integrating external APIs and ensuring scalability in production-level applications. I prioritize practical, scalable solutions for challenging components with detailed, real-world use cases.

How would you like ChatGPT to respond?

Tailor responses to the complexity of my components. Focus on specific optimizations related to the task, such as handling API polling, complex state updates, reducing re-renders, and optimizing for scalability. Provide detailed examples that align with real-world challenges, including error handling and performance strategies for production environments. Avoid overly generic suggestions unless requested.

As you can see, the provided example is focused on optimization, scalability, and performance. You might need a few tries to create the perfect instructions for the project. Create a few instructions for a basic task. Then, evaluate and adjust the results using AI tools.

Another way ChatGPT can create a person’s profile is called Memory. AI can memorize details about software developers based on frequent requests or specific prompts. This Memory is an account-wide context that’s not limited to a single chat.

Now, you’re pretty much ready to start the development process. If you still want to learn more about AI, here’s a list of free, high-quality resources and courses that will get you up to speed with artificial intelligence and software engineering:

OpenAI ChatGPT guide
GitHub Copilot Documentation
Prompt Engineering Guide
Learn Prompting course
Learn Prompting: “ChatGPT for Everyone”
Udemy: ChatGPT Masterclass: ChatGPT Guide for Beginners to Experts!
Coursera: Generative AI for Everyone
Towards AI (practical AI cases)
AI Weirdness (fun and weird AI cases)

#3 Create a business case template

This is where AI software development actually starts.

Making software with AI is like navigating a winding path in the mountains with a friend who’s fitter and more athletic than you can ever dream of being. Except your friend is blind, and for some strange reason, you refuse to walk. So, you’ll have to guide your friend’s every step as he carries you towards Mount Doom.

It’s a good idea to know where exactly you’d like to arrive.

To start the project, create a template for business cases that are easy to convert into code. Here’s one such template I used to build the Service Call app. Just make a copy of the spreadsheet to edit the AI prompts:

Few people can answer all of these questions right away, especially if they lack certain technical skills. To solve this problem, ask the chat to generate a list of questions for a subject-matter expert. Then, direct the AI to answer these questions.

#4 Turn the template into a software specification

The answers you get from AI can serve as a basis for a technical specification.

I was surprised to see that even a free-form description of a business case without technical details can provide a fairly good architecture description—endpoints, models, file structure/organization, and all. The AI even suggested using JWT, although the original request had no authorization details.

ChatGPT saves a lot of time on simple tasks. Instead of searching for a similar case, verifying, and testing the proposed solution, you can question AI. For example, to fix an error during the installation of a package for a dev environment, AI provides advice on debugging along with a solution with an explanation.

I have rather basic knowledge of backend development and CI/CD setup. So, I asked the AI to use this template to create a document that describes my project.

The “interview me” approach transforms AI from a passive tool into an active collaborator. It essentially plays the role of a curious, well-informed interviewer. Instead of you explaining the entire project, the AI initiates with broad questions—then drills down to more detailed or specific inquiries based on your responses.

For instance, in a Service Call project that required Stripe and Daily.co integrations alongside custom user flows, the AI posed questions like, “Have you considered using an existing package for this integration, or would custom development be more suitable?” This triggered ideas I hadn’t yet explored.

As the conversation progresses, the AI consolidates the insights to propose a well-tailored solution.

All in all, AI is a great tool for producing a step-by-step description of tasks with known technical aspects. Convenient and fast for idea validation. Here’s a query template I use for this purpose:

Environment description (OS, packages, internal/external services, versions).
Challenge/problem/error description.
Preferred options (how to debug, log, fix, etc.) of the expected result.

#5 Generate code with AI

To make software with AI in as few iterations as possible, I recommend the following process:

Describe the generative AI engineering level. For example, “As a middle Full-stack JS developer, create…”
Provide the existing requirements (JSX or TSX, installed modules, an example of an existing component).
Request a step-by-step instruction instead of a code snippet.
Generate code snippets for each of the steps one by one.
Iterate and refine.

Here are a few important details you should know to improve the usefulness of AI in software development.

Provide AI with sufficient details

Before starting, be sure to describe your project, including its structure and software versions (frameworks, packages, and libraries). Specify the language, technologies, initial data, and so on. If necessary, provide links to new versions of packages or implementation approaches.

If the chat doesn’t produce the desired result, it’s often better to start a new dialog, even in the middle of a long discussion. This helps the AI reset and respond more accurately to your questions.

This valuable context allows generative AI to suggest better options both initially and later on. For example, modern JS framework projects use many packages. The number of these packages rises significantly when AI-generated code. We recommended informing the AI model about existing packages by analyzing the package.json file.

Note that ChatGPT doesn’t know about new updates or technologies that appeared after its launch date. However, you can ask it to read the new release documentation and then communicate to solve the problem.

Structure your prompts correctly

Here are two example prompts you might use with AI in software engineering:

Code generation request (Create a bash script to import DB dump.)
Data analysis request (Create a script that will validate and save JSON and save files if there are links to images in JSON. Also, here is a package.json).

Both queries generate content. In the first case, however, the result format is not as important. My example doesn’t require any additional specifics for the script to work.

In the second case, the implementation matters. The prompt has extra requirements for code generation. Although package.json doesn’t contain all the necessary information, it greatly reduces the number of iterations needed to generate production-ready code.

AI models write almost all necessary code for simple requests, such as “Create React component to upload files, parse content and send text to XXX API”.

However, each of the components/modules is very basic. The UI is CSS-free and there’s no error handling. Most features are implemented as if each of them lives separately. In other words, the proposed code will be similar to the examples of similar features.

So, I recommend making your instructions as structured as possible.

For example, if you have data from a brainstorming session (after speech-to-text conversion), set clear requirements for data analysis and the expected result.

Use a pseudocode language for consistent output

AI is lazy. Like an unpaid intern, it spends minimum resources to generate output. If you don’t specify requirements and the needed amount of code, AI often saves resources by skipping some of its parts.

Instead of requesting the whole snippet, ask AI to provide step-by-step instructions. Ask what’s required to do the job, then ask how to do each step. For example:

“What files are required to organize Stripe integration?
What packages to install?
What should be in “@types”?
What API endpoints logic should look like?
And so on…”

One of the problems with AI in software development is the output variability.

Unless you adhere to common standards, the generated code is significantly different every time. This is a major problem when monitoring features in the repository. Component changes will be significant, negatively affecting the code review.

To solve the problem of inconsistent output, I first asked AI to create a pseudocode language I could later use to describe the desired code. This language is a description template that ensures almost identical results whenever you need a component. This also makes it much easier to explain development standards to AI.

AI creates a pseudo-language, describes functionality using the new language, and generates code with minimal intervention.

Think about deployment as early as possible

When you start thinking about how to build an app using AI, deployment isn’t one of the first things that comes to mind. However, this can cost you a lot in terms of development time.

My recommendation would be to consider deployment during code generation. This is especially important regarding API keys, tokens, databases, and other things that should be managed appropriately.

While adding layers of code, logic, and sample data, be sure to ask AI for recommended practices for their deployment. You can do this later, too, however, you could miss parts of the code that need to be added to the context of the CI/CD process creation (e.g., seed file with sample data or webhook URL).

#6 Refine the AI-generated code

With basic prompts, AI coding output for popular programming languages reaches a Strong Junior Developer’s (L2) level. Enhancing your prompts with conditions, structure, patterns, and so on helps generate higher-level code.

However, you shouldn’t expect a Senior-level (L5) code output. We’ve tested AI with users of various experience levels—from Strong Junior Developers (L2) to Middle (L3), Strong Middle (L4), and Senior Developers (L5).

The results show that the code output is typically one level below the user’s skills.

Generating low-quality “write-only” code will complicate any future work on the project. Your company might also have its own standards, best practices, and code norms.

This means you’ll need to spend time refining the code. Editing the initial prompt or adding new requirements to the already generated code (in the format of chat, request → updated version) is fine.

However, you should never spend more time editing the prompts than you’d spend coding the thing yourself. The right amount of time for code revisions depends on task complexity and project type.

Task complexity is simple. The easier the task for a meatbag, the faster the meatbag can formulate requirements and validate the result. For the project type, you can divide most use cases into two categories:

PoC (Proof of Concept)

This is a project that’s only meant to verify an idea, paving the way for a more sophisticated solution.

PoC focuses on quick results. Basic code requirements are enough for 90% of cases.

Approaches like creating tests and asking AI to generate code that will pass the tests may be fun but likely too controversial.

MVP (Minimum Viable Product)

MVP is a project that will expand in the future despite the use of AI. Because MVPs have more stringent code generation requirements, you’ll need to spend time on refactoring and optimization.

For example, when adding JWT (JSON web tokens) for authorization, it should be immediately separated into another service. This service can later be used in URL tokenization for a payment gateway (for example, by adding a transaction number to the URL).

mvp

#7 Add new functionality/refactor the code

As I mentioned, the PoC approach doesn’t lend itself to post-release improvements. MVP, however, is a different story altogether.

When extending an AI-generated MVP or a regular SaaS product, a developer needs to dive into the project architecture. Infrastructure as Code (IaC) is a very useful trend for this purpose. You can get step-by-step deployment instructions by asking AI about the content of your docker-compose.yml file.

If you have additional services such as Prisma (a toolkit for working with databases and ORM for Node.js and TypeScript), AI can provide a set of commands for working with the database if asked in the context of the docker-compose.yml file.

With undocumented SaaS applications, AI can analyze the structure and logic much faster than humans. You can ask it to create a quick prototype of a new feature or implement it as a separate service via APIs instead of integrating it directly into the existing code.

Uniformity is key in large projects. Linters and code formatting rules are a must, especially for scripting languages. However, they do not guarantee uniform style.

A developer should revise the code of an existing module to extend its functionality. AI handles this task pretty well. You can provide AI with the existing code and ask it to extend it in the same style.

Working with a single component is a linear task, both when generating and refactoring the code. AI tools are pretty good at such tasks. Creating a feature that requires logic, visualization, and database work is also not a problem. AI will provide detailed instructions on adding code to specific files. Of course, this requires a description of the project structure.

Difficulties arise when trying to refactor a project.

This requires you to provide a detailed description of the project, its structure, all the necessary files, and the expected result. When dealing with a large amount of diverse information, AI tends to go the lazy way and either focus on one aspect or produce a result similar to the input.

Conclusion

AI is a powerful tool for software developers. Like a V8 muscle car, it needs a GPS navigator and an experienced driver to not end up in the middle of nowhere. In other words, AI-driven projects require you to have at least some technical knowledge and coding skills.

The step-by-step process described in the article allows you to bridge most gaps in knowledge and produce high-quality code output.

Empirically, we discovered that the best results and cost-savings arise when the AI user is at least a Strong Middle (L4) Developer. Otherwise, the execution level is pretty low overall, requiring you to rewrite parts of the project.

AI is a fast-changing field. With new releases happening almost daily, the entry barrier to AI software development will likely decrease. So keep an eye on the new and emerging models.

I hope you found the article useful. If you have some questions or need help with your project, don’t hesitate to contact us.

AI Software Development: Lessons from a Commercial Project