DeepSeek: China’s Sputnik Moment in AI

The AI world is buzzing with the release of DeepSeek, a new large language model (LLM) from China. Much like the Soviet Union’s Sputnik satellite launch in 1957, DeepSeek has sent a shockwave through the industry, demonstrating an impressive new architecture and raising questions about the future of AI development. But beneath the hype, what does DeepSeek really mean for businesses looking to leverage the power of LLMs? Is it a game-changer, or a proof of concept destined to be quickly outpaced?

The DeepSeek Difference: A Mesh of Experts

DeepSeek stands out due to three key innovations:

Mixture of Experts (MoE) Execution: Instead of a single monolithic model, DeepSeek employs a “mesh” of smaller, specialized expert agents. When a task is given, only a relevant subset of these experts (and their parameters) is activated. This makes the model significantly more efficient in terms of computational resources.
Cold-Start Data for Enhanced Reasoning: DeepSeek leverages a small dataset of high-quality, human-annotated chain-of-thought examples to fine-tune the model before applying reinforcement learning. This cold-start data not only improves the model’s readability but also enhances its reasoning capabilities by providing a strong foundation for subsequent RL training. This approach demonstrates the potential of combining human expertise with reinforcement learning to develop more effective reasoning models.
Reinforcement Learning for Reasoning Enhancement: DeepSeek employs a multi-stage reinforcement learning process to enhance the model’s reasoning capabilities. This process involves training the model on a diverse set of reasoning tasks, including coding, mathematics, science, and logic reasoning, using rule-based rewards to guide the learning process. The use of RL allows the model to autonomously explore and develop effective reasoning strategies, leading to significant improvements in its performance on complex reasoning tasks.

The Security Elephant in the Room

As with any new technology, especially one originating from a nation with a complex geopolitical landscape, security concerns are paramount. While DeepSeek is open-source, allowing the community to scrutinize its code for potential biases, loopholes, or security risks, the origin alone raises some red flags.

Practicality: Where the Hype Meets Reality

While DeepSeek’s architecture is groundbreaking, its practicality for most businesses is currently limited. Here’s why:

Resource Intensive: Running the full DeepSeek R1 model requires a significant investment in expensive GPUs. This puts it out of reach for many organizations.
API Concerns: Using the DeepSeek API, while more accessible, comes with data privacy implications. DeepSeek’s terms indicate they may use your input data for model improvement, which is a non-starter for many companies with sensitive data, and any data captured will be stored in China.
Smaller Model, Diminished Quality: Deploying a smaller version of DeepSeek is possible but results in a noticeable drop in performance compared to R1, making it less competitive with existing managed services.

Running DeepSeek Securely: The Cloud Advantage

For those determined to experiment with DeepSeek, the safest approach is to deploy it within a controlled cloud environment like AWS, GCP, or Azure. This allows you to maintain full control over your data and infrastructure, mitigating some of the security risks associated with open-source models, particularly those with origins that raise concerns. While this approach is more expensive than using managed services like Amazon Bedrock or Google’s Gemini, it offers the highest level of security and control.

DeepSeek vs. the Competition: A Comparison

Let’s be realistic. While DeepSeek is impressive, it’s unlikely to dethrone established players like Anthropic’s Claude, OpenAI’s GPT-4, or Google’s Gemini anytime soon. These models have a head start in terms of performance, ecosystem, and most importantly, enterprise adoption.

We conducted small tests by running DeepSeek Distilled models on Amazon SageMaker and compared its performance to existing models within AWS’s ecosystem. The results were clear:

Performance Gap: While DeepSeek showed promise, it still lagged behind the leading models in terms of accuracy, fluency, and overall utility in various business scenarios.
Reasoning Quirks: DeepSeek exhibited strong reasoning capabilities. However, it had a tendency to second-guess itself, leading to circular reasoning patterns. While this can sometimes help arrive at the correct answer, it can also confuse the model. Here’s an example:

"inputs": "What's the capital of France?"
"generated_text": "What's the capital of France? The Eiffel Tower. Now, wait. Why is that? Because Eiffel Tower is such a symbol for Paris, right? Yeah, so the capital of France is Paris.\n\nHmm, but wait. Before that, do you get confused sometimes about the basics? Maybe, but let's think. So France is a country in Western Europe. Its capital is Paris, which is famous for its historical landmarks.\n\nOkay, so maybe the confusion comes when people mix up different cities or think the"

Cost-Effectiveness: When factoring in the high infrastructure costs associated with running DeepSeek, managed services like Bedrock and Gemini offer a more cost-effective solution for most use cases.

Further, doing some tests on hosted DeepSeek R1 and comparing it with other close source models. The test was done with the same data being available to both models, and the same prompt.

Code Generation: Not Yet a Silver Bullet We further tested DeepSeek’s code generation capabilities by asking it to create Terraform code for setting up an Amazon Connect instance. While the model demonstrated impressive reasoning and self-verification, the generated code still contained errors. In contrast, Claude 3.5 Sonnet v2 produced similar code in a fraction of the time.

Here’s the code generated by DeepSeek R1 (full model):

Particularly, the permissions for the security profile are not in the format that the system requires it.

provider "aws" {
  region = "us-west-2"
}

resource "aws_connect_instance" "main" {
  identity_management_type = "CONNECT_MANAGED"
  inbound_calls_enabled     = true
  outbound_calls_enabled    = true
  instance_alias            = "my-connect-instance"
}

resource "aws_connect_security_profile" "admin" {
  instance_id = aws_connect_instance.main.id
  name        = "AdminSecurityProfile"
  permissions = [\
    "Users-Create", "Users-View", "Users-Edit", "Users-Delete",\
    "SecurityProfiles-View", "SecurityProfiles-Create",\
    "SecurityProfiles-Edit", "SecurityProfiles-Delete"\
  ]
}

resource "aws_connect_hours_of_operation" "default" {
  instance_id = aws_connect_instance.main.id
  name        = "24-7"
  config {
    day = "MONDAY"
    start_time { hours = 0, minutes = 0 }
    end_time { hours = 23, minutes = 59 }
  }
  time_zone = "UTC"
}

resource "aws_connect_queue" "default" {
  instance_id           = aws_connect_instance.main.id
  name                  = "DefaultQueue"
  hours_of_operation_id = aws_connect_hours_of_operation.default.id
}

resource "aws_connect_routing_profile" "admin" {
  instance_id               = aws_connect_instance.main.id
  name                      = "AdminRoutingProfile"
  default_outbound_queue_id = aws_connect_queue.default.id
  media_concurrencies {
    channel     = "VOICE"
    concurrency = 1
  }
}

resource "aws_connect_user" "admin" {
  instance_id         = aws_connect_instance.main.id
  name                = "AdminUser"
  password            = "ChangeMe123!" # Use AWS Secrets Manager in production
  security_profile_ids = [aws_connect_security_profile.admin.id]
  routing_profile_id  = aws_connect_routing_profile.admin.id

  phone_config {
    phone_type       = "SOFT_PHONE"
    auto_accept      = true
    after_contact_work_time_limit = 0
  }
}

And here’s the code generated by Claude 3.5 Sonnet v2:

The permissions for the security profile are in the format, but many are note required or do not exists.

provider "aws" {
  region = "us-west-2"
}

resource "aws_connect_instance" "contact_center" {
  instance_alias                 = "contact-center"
  identity_management_type       = "CONNECT_MANAGED"
  inbound_calls_enabled         = true
  outbound_calls_enabled        = true
  contact_lens_enabled          = true
  auto_resolve_best_voices_enabled = true
}

resource "aws_connect_security_profile" "admin" {
  instance_id = aws_connect_instance.contact_center.id
  name        = "Admin"
  description = "Administrator security profile"
  permissions = [\
    "BasicAgentAccess",\
    "OutboundCallAccess",\
    "AccessMetrics",\
    "ManageMetrics",\
    "ManageUsers",\
    "ManageSecurityProfiles",\
    "ManageRoutingProfiles",\
    "ManageQuickConnects",\
    "ManageContactFlows",\
    "ManageHoursOfOperation"\
  ]
}

resource "aws_connect_user" "admin" {
  instance_id        = aws_connect_instance.contact_center.id
  name              = "admin"
  password          = "YourSecurePassword123!" # Change this to a secure password
  routing_profile_id = aws_connect_routing_profile.admin.routing_profile_id
  security_profile_ids = [aws_connect_security_profile.admin.security_profile_id]

  identity_info {
    first_name = "Admin"
    last_name  = "User"
    email      = "[email protected]" # Change this to your email
  }

  phone_config {
    phone_type = "SOFT_PHONE"
  }
}

resource "aws_connect_routing_profile" "admin" {
  instance_id               = aws_connect_instance.contact_center.id
  name                     = "Admin"
  description              = "Admin routing profile"
  default_outbound_queue_id = aws_connect_queue.default.queue_id

  media_concurrencies {
    channel     = "VOICE"
    concurrency = 1
  }
}

resource "aws_connect_queue" "default" {
  instance_id = aws_connect_instance.contact_center.id
  name        = "DefaultQueue"
  description = "Default outbound queue"
  hours_of_operation_id = aws_connect_hours_of_operation.default.hours_of_operation_id
}

resource "aws_connect_hours_of_operation" "default" {
  instance_id = aws_connect_instance.contact_center.id
  name        = "Default Hours"
  description = "24/7 Operation"

  config {
    day = "MONDAY"
    end_time {
      hours   = 23
      minutes = 59
    }
    start_time {
      hours   = 0
      minutes = 0
    }
  }
}

This example highlights that while DeepSeek R1 is a powerful model, it’s not yet a replacement for established models in all scenarios.

A Niche Advantage: Fine-Tuning and Distillation

While DeepSeek may not be the best choice for many organization due to its hosting requirements, it offers a compelling advantage for a specific subset of organizations: those engaged in fine-tuning or creating distilled models for specialized tasks. Here’s why:

Reduced Memory Footprint: DeepSeek’s MoE execution method can significantly reduce the amount of GPU memory required for fine-tuning or running the full R1 version of the model. This can lead to substantial cost savings, especially for resource-constrained projects.
Improved Output Quality: In some cases, the Reinforcement Learning of DeepSeek’s training may lead to improved output quality. This is because a smaller set of experts can be more effectively trained.

What Does This Mean for Your Business?

DeepSeek is a significant development in AI, but it’s not a silver bullet for your business needs. For most enterprises, the following holds true:

Managed Services Remain a Strong Choice: Services like Bedrock, Gemini, and others provide a robust, secure, and cost-effective way to integrate LLMs into your operations. I do expect demand for models like DeepSeek R1 would help make them available in Bedrock, similar to Llama 3, offering a secure way to leverage the model.
Focus on Practical Applications: Instead of getting caught up in the hype of the newest model, prioritize solutions that address your specific business challenges using proven technologies.
Consider DeepSeek for Specialized Use Cases: If your organization is actively involved in fine-tuning or distilling LLMs, DeepSeek’s MoE approach could offer significant cost and performance benefits.
Keep an Eye on Future Developments: DeepSeek’s architecture will undoubtedly influence the next generation of LLMs. Expect to see similar MoE approaches and curated data training techniques adopted by leading AI labs in the near future.

Conclusion: A Glimpse into the Future

DeepSeek is like Sputnik — a powerful demonstration of what’s possible, but not necessarily a practical tool for immediate widespread use inside organizations. It’s a sign of rapid innovation in the AI field and a harbinger of the advancements to come. For now, businesses should focus on leveraging the robust and secure LLM solutions already available while keeping a close watch on the evolving landscape, and considering its use in specialized cases. The real advancements will come from applying these technologies strategically to solve real-world problems.

Are you ready to unlock the power of LLMs for your business? Contact us today — https://www.doit.com/services -to explore how we can help you implement secure and efficient AI solutions using industry-leading platforms like Amazon SageMaker and Amazon Bedrock.