ABD338: MirrorWeb: Powering Large-scale, Full-text Search for the UK Government Web Archives Using Amazon Elasticsearch Service
Listen now
Description
MirrorWeb offers automated website and social media archiving services with full text search capability for all content. The UK government hired MirrorWeb to provide search services across 20 years of archived data from over 4,800 websites. In this session, MirrorWeb discusses the technology stack they built using Amazon Elasticsearch Service (Amazon ES) to search across the 333 million unique documents (over 120 TB) that they indexed within a 10-hour period. They discuss how they moved data from on-premises to Amazon S3 using AWS Snowball and then processed that data using Amazon EC2 Spot Instances, reducing costs by over 90%. They also talk about how they used AWS Lambda to ingest data into Amazon ES. Finally, they share best practices for building a large-scale document search architecture.
More Episodes
AWS has launched Amazon Sumerian. Sumerian lets you create and run virtual reality (VR), augmented reality (AR), and 3D applications quickly and easily without requiring any specialized programming or 3D graphics expertise. In this session, we will introduce you to Sumerian, and how you can build...
Published 12/01/17
Join us to hear about our strategy for driving machine learning innovation for our customers and learn what's new from AWS in the machine learning space. Swami Sivasubramanian, VP of Amazon Machine Learning, will discuss and demonstrate the latest new services for ML on AWS: Amazon SageMaker, AWS...
Published 12/01/17
AWS has launched Amazon Sumerian. Sumerian lets you create and run virtual reality (VR), augmented reality (AR), and 3D applications quickly and easily without requiring any specialized programming or 3D graphics expertise. In this session, we will dive deep into details about Sumerian so you can...
Published 12/01/17