IOS应用抓取HTML网页数据

释放双眼,带上耳机,听听看~!

举例抓取hao123上的搞笑图片及Gif动画的网址为例。
1.目标网址:

http://www.hao123.com/gaoxiao?pn=1

2.获取HTML数据。方法如下:

NSString *htmlString = [NSString stringWithContentsOfURL:[NSURL URLWithString:@"http://www.hao123.com/gaoxiao?pn=1"] encoding:NSUTF8StringEncoding error:nil]; 

3.分析网页内容,找到需要的资源链接前后的关键字符串。
目标网址资源前后关键字分别为:
前:

@"<img selector="pic" img-src=""  
后:
@"" src="  

4.从htmlString中截取需要的字符串。方法如下:
为NSString添加一个Catalog

@interface NSString (MYNSStringExtensionMethods)  
- (NSArray *)componentsSeparatedFromString:(NSString *)fromString toString:(NSString *)toString;  
@end  
@implementation NSString (MYNSStringExtensionMethods)  
  
- (NSArray *)componentsSeparatedFromString:(NSString *)fromString toString:(NSString *)toString  
{  
    if (!fromString || !toString || fromString.length == 0 || toString.length == 0) {  
        return nil;  
    }  
    NSMutableArray *subStringsArray = [[NSMutableArray alloc] init];  
    NSString *tempString = self;  
    NSRange range = [tempString rangeOfString:fromString];  
    while (range.location != NSNotFound) {  
        tempString = [tempString substringFromIndex:(range.location + range.length)];  
        range = [tempString rangeOfString:toString];  
        if (range.location != NSNotFound) {  
            [subStringsArray addObject:[tempString substringToIndex:range.location]];  
            range = [tempString rangeOfString:fromString];  
        }  
        else  
        {  
            break;  
        }  
    }  
    return subStringsArray;  
}  
  
@end  

5.获取并输出资源地址

NSArray *urls = [htmlString componentsSeparatedFromString:@"<img selector="pic" img-src="" toString:@"" src="];  

输出:

NSLog(@"find urls:%@", urls);  

输出结果:

find urls: (  
http://img.hao123.com/data/3_a43d768470ea5785e5bbf3ca2c81e4a7_430,  
http://img6.hao123.com/data/3_c87ac28d85b361b5efc9654cdb24c745_430,  
http://img0.hao123.com/data/3_759c73a935eb8c3ebae5646eb71b3028_0,  
http://img.hao123.com/data/3_415b7834328e6a4fc70f50854828df22_0,  
http://img5.hao123.com/data/3_e84664284f1cbf59eb364d147fc1610f_430  
)  

人已赞赏
iOS文章

iOS短信计时器

2019-9-19 15:06:56

iOS文章

iOS-OC-Runtime使用小谈(objc_setAssociatedObject)

2019-9-19 15:22:48

个人中心
购物车
优惠劵
今日签到
有新私信 私信列表
有新消息 消息中心
搜索