CrewAI is Good! – My First Impressions

I stumbled upon CrewAI a few weeks ago through the course “Practical Multi AI Agents and Advanced Use Cases with crewAI“. While I was initially hesitant to try it since I’m already using Langgraph for my agent Sydney, two things sparked my interest:

I’ve been exploring ways to create podcasts from scratch, particularly focusing on how to use Gen AI to “reason” over my 15 years of blog content rather than just copying and pasting posts. I had tried NotebookLLM for this (which worked okay, you can check out the podcast episode here), but I wanted more control over conversation flow and opening hooks.
I wanted to learn something new and experiment with OpenAI’s text-to-speech models.

1. The good stuff
2. The Reality Check: It’s Not Just “Click and Create”
3. Show Me The Results!
Final thoughts

After about 10 days of playing with CrewAI, here are my key observations:

1. The good stuff

1.1 Surprisingly easy to get started

The initial learning curve is remarkably short – I was up and running in just a few hours
Setting up custom tools (like retrieving content from my blog using Weaviate as the vector store) was straightforward. Turning your podcast script to audio output is pretty easy to set up too.
The ability to describe agents and their tasks in plain English using YAML files is powerful (Pro tip: Visual Studio Code’s autocomplete is super helpful here!)

1.2 Flexible model selection

Switching between different LLMs is as simple as updating your crew.py:

llm_openai_4o_mini = LLM(model="gpt-4o-mini", temperature=0)
llm_anthropic_35 = LLM(model="claude-3-5-sonnet-20240620", temperature=0)
llm_openai_4o = LLM(model="gpt-4o", temperature=0)
llm_gemini_15_pro = LLM(model="gemini/gemini-1.5-pro-002", temperature=0)

You can then assign specific models to different agents based on their strengths. For example,

@agent
	def content_researcher(self) -> Agent:
		return Agent(
			config=self.agents_config['content_researcher'],
			llm=llm_anthropic_35,
			tools=[BlogContentRetrievalTool()], 
			verbose=True
		)

All of this means a great deal of control on how I want my podcast to be structured, scripted.

1.3 Text-to-Speech: Promise and Limitations

While OpenAI’s text-to-speech API is impressive in quality, it currently offers only six voice models. For podcast creation, this is quite limiting – especially when you’re trying to create engaging conversations between multiple hosts. The lack of voice variety means you might end up with podcasts that sound similar to others using the same technology. This is definitely an area where I hope to see improvement in the future, either through OpenAI expanding their voice options or through integration with other text-to-speech providers.

I can understand the AI safety concerns as well so the different AI labs may not be too hasty in providing too many voice models.

2. The Reality Check: It’s Not Just “Click and Create”

Initially, I worried this might contribute to the flood of AI-generated content (or “AI Slop”) we’re seeing online. After all, I could generate a 15-minute podcast script in about 5 minutes. (In the DeepLearning.AI training course above, João actually went through a code example of “Content creation at scale”.)

However, my perspective changed after actually reading/reviewing the first few initial scripts generated.

Creating high quality content still requires significant work!

2.1 Thoughtful Agent Structure

I had to revise the structure of the AI crew multiple times, adding in additional roles, especially the role of a “fact_checker”. My current podcast crew includes:

Content researcher
Script writer
Fact checker
Script editor
Audio producer

Success requires:

Carefully defining each agent’s goals and tasks using industry-specific language. Someone with actual industry experience with podcast can use industry specific language and can ask each agent to perform very specific task. The output will be much better that way.
Being selective with tool access (more isn’t always better). It is easy for the agents to be stuck in continuous loops.
Clear delegation rules between agents
Specific output structure requirements
Well-defined quality criteria of “What good work looks like” (I even got my daughter involved here – she’s the creative one!).

So again, as you can see, while using a crew of AI agents help to speed up my work significantly (at least 5X from research to scripting, fact check, revision, audio creation), it is still up to me to create high quality content.

2.3 Model selection matters

Different LLM models have distinct “personalities” and varying levels of instruction-following. So you have to experiment to understand the strengths and weaknesses of different models and how they suite your need at each step of the process.

Some observations:

API responses can differ from web chat interface responses for the same model
Currently, I prefer Anthropic models for long-form content using the API. However, when it comes to the web version, I actually think that claude-3-5-sonnet-20241022 and GPT-4o are on par.
OpenAI’s o1-preview is my go-to for coding tasks

2.4 Feedback and Memory are Game-Changers

You have to provide feedback to your AI crew. They are good at following directions but they do not know what you want and can not read your mind (at least not yet haha). The ability to train your crew through feedback is crucial.

With CrewAI, it is quite simple to train your crew and give feedback, simply by running

crewai train -n <n_iterations> <filename> (optional)

While I haven’t fully explored CrewAI’s memory functions yet, the combination of feedback and memory seems incredibly powerful for creating consistent, high-quality output.

3. Show Me The Results!

Ok, ok – I hear you saying “Chandler, you’ve talked enough. Show me a sample of the podcast script generated by your AI crew!”

Here’s a complete workflow example:

Research Phase: See how the Content Researcher agent analyzed and extracted key information from my blog posts
Fact Check Summary: The Fact Checker’s detailed verification report
Initial Script Draft: The Script Writer’s first take on the podcast conversation
Final Polished Script: The Script Editor’s refined version with improved flow and engagement
Listen to the Result: The final audio version produced by the Audio Producer agent

Each link above shows the progression from raw content to polished podcast, demonstrating how different agents contribute to the final product.

While I still have thoughts about how I can improve the entire pipeline more, I hope the above gives you a good sense of what is possible.

Final thoughts

CrewAI has impressed me with its balance of simplicity and power. While it makes content creation more accessible, it’s not a magic button – quality still requires expertise, careful planning, and continuous refinement. I’m excited to keep exploring its capabilities, especially in improving my podcast production workflow, though I hope to see improvements in the variety of available voice options for the final output.