5-Day Gen AI Intensive Course with Google : Capstone project!

My Experience with the 5-Day Gen AI Intensive Course with Google

I participated in a generative AI learning event hosted by Google.

The course materials were incredibly comprehensive (totaling over 300 pages) and allowed me to run and verify source code on Kaggle notebooks. Communication between organizers and participants was active through Discord and YouTube live streams.

The curriculum covered everything from basic principles to cutting-edge topics like Agents, providing a wide range of knowledge.

Curriculum

  1. Foundational Large Language Models & Text Generation
  2. Embeddings & Vector Stores
  3. Prompt Engineering
  4. Solving Domain-Specific Problems Using LLMs
  5. Agents
  6. Agents companion
  7. Operationalizing Generative AI on Vertex AI using MLOps

While working full-time, completing the course in my spare time was quite challenging. I often worked on the materials late into the night, leading to sleep deprivation and tough days, but the depth of learning was worth it. Unfortunately, I couldn’t join the live streams as they were at 3:00 AM Japan time, which was disappointing.

Since all materials were in English, there was a language barrier, but I was able to learn effectively by sending texts and Kaggle screenshots to Gemini and ChatGPT for explanations.

Materials were delivered based on different themes over five days, and to complete the course, we had to create and submit a Capstone project (an AI-related application) using the knowledge we gained.

Here, I’d like to introduce my submission.

Capstone project: ArXiv Paper Summarizer

Link to notebook

Description

This script searches for recent papers on “Large language model” on arXiv, randomly selects one paper, downloads its PDF, extracts the text content, summarizes the text using the Google Gemini API (specifically gemini-2.0-flash), and outputs the summary in a predefined JSON format.

Flow

  1. Search arXiv for the latest 100 papers matching the query.
  2. Randomly select one paper from the search results.
  3. Download the PDF of the selected paper.
  4. Extract text from the PDF using PyPDF2.
  5. Truncate the text if it exceeds the approximate token limit (MAX_INPUT_CHARS).
  6. Send the extracted text and paper metadata to the Gemini API for summarization.
  7. Parse the Gemini response to get the JSON summary.
  8. Print the JSON summary to the console.
  9. Save the JSON summary to a file (summary.json) in the /kaggle/working/ directory.

Creative Aspects

For the prompt content and summary format, I referenced the Ochiai method. This can be customized according to preference.

Getting LLMs to output in JSON format was something I hadn’t tried before, so I was glad to experiment with it here.

I delegated all coding to Gemini 2.5, practicing what’s known as “vibe coding.” I refined the specifications through consultations with Gemini and ChatGPT before making the request.

Here’s what my specification document looked like:

Please generate code based on the following development specifications:

# Development Specifications
## Execution Environment
Kaggle Notebook (Python)
## LLM to be Used
Google Gemini (google-genai==1.7.0)
Gemini API key already acquired
## Processing Flow
1. Search for arXiv papers
   Search keyword: Fixed as "Large language model"
   Paper language: English only
   Get the latest 100 search results and randomly select 1
   If fewer than 100 papers, select randomly from those available
2. Read PDF file
   Download PDF, extract text using PyPDF2
   Pass extracted text to Gemini API
   Maximum input tokens: 300,000 tokens. If exceeded, cut from the end
3. Summarization by Gemini
   Summarize extracted text with Gemini
   Output in the following JSON format
   JSON output format for summary results
{
  "Title": "Title of the paper",
  "Authors": "Author names",
  "Published": "YYYY-MM-DD",
  "PDF URL": "http://arxiv.org/pdf/xxxxx",
  "summary": {
    "What is the paper about?": "Provide a brief summary of the paper.",
    "What makes it impressive compared to previous studies?": "Describe its novelty, originality, and what new problems it has solved.",
    "What is the core of the technology or method?": "Explain the key technical or methodological points.",
    "How was its effectiveness verified?": "Describe the methods and results used to evaluate its effectiveness.",
    "How is this field expected to develop in the future?": "Discuss future challenges and prospects for the field."
  }
}
## Output Method
Print display on Kaggle Notebook
Also save to file for download from Notebook (e.g., summary.json)
## Gemini Summarization Accuracy and Reliability
No specific accuracy improvements or corrective measures implemented
## Error Handling/Logging
Handle only minimal errors (API communication errors, PDF acquisition errors, Gemini API errors)
No log output
## Execution Method
Manual execution by user (no regular execution or automatic scheduling)




Future Prospects

It would be even better to set up regular execution on Google Cloud Run and deliver daily updates via email or Slack.

Keeping a history of summarized papers and building a database would help accumulate knowledge effectively.

It would also be possible to pick recommended papers rather than simply selecting them randomly.

In Conclusion

I am deeply grateful to Google’s organizing team for providing such a valuable learning opportunity. Though brief, the content was incredibly intensive and valuable. I was able to deepen my understanding of generative AI and acquire practical skills.

If similar events are held in the future, I would definitely like to participate. I hope to further develop the knowledge I gained and take on more complex generative AI projects.